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CHAPTER 1. 

INTRODUCTION AND MOTIVATION 


The technical objective of this project has been to develop algorithms for 
compressing digital data that are compatible with the new generation of smart missiles 
and that can expand the normal operational role of such weapons to include a means for 
battle damage indication (BDI). A real time BDI system would be an integral and 
effective tool for adaptive mission planning and retargeting during a strike and for vital 
imagery required for subsequent rapid retargeting and mission planning. This project has 
developed the means for compressing video digital data by a large factor—from an 
almost lossless compression ratio of 20:1 to a highly lossy ratio of 200:1. The 
compression task also includes compatible coding techniques that convert the compressed 
video images into an embedded bit stream for datalink transmission. An embedded bit 
stream is an ordered set of bits representing the compressed image as illustrated by 
Figure 1-1 (i.e., the bit stream is organized so that the most important bits can be 
transmitted first). Such embedded coders have many advantages for BDI applications, 
including their ability to generate fixed-rate bit streams without buffering and their 
suitability for error-prone communications channels. In order to ensure that the 
reconstructed BDI images meet the accuracy required by a human or computer 
interpreter, the encoder has been optimized to achieve the lowest bit rates for a wide 
range of operating conditions. The compression algorithm has also been designed to 
operate within the context of a proposed operational system, which must be a low-cost, 
missile-borne electro-optical (EO) camera or an infrared (IR) imaging array transmitting 
to a ground station via a datalink. The algorithm and its realization in signal processing 
hardware has been matched to expected datalink bandwidths and transmission speeds in 
the year 2000. 



Bits Received 


FIGURE 1-1. Embedded Image Compression. 
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The BDI notional concept is a low-cost, EO camera, or an IR imager, transmitting to 
a primary receiving station via a datalink. The system definition has three primary 
components: 

1. Imaging sensor requirements 

2. Sensor platform requirements 

3. Datalink between the sensor platform and a receiving station 

Other factors such as weapon speed, warhead size, target set, final attack posture, 
and expected damage mechanism, along with sensor and platform requirements, have 
been reviewed and analyzed and a configuration compatible with a Joint Stand-off 
Weapon (JSOW) submunition pack has been selected. Consider existing and planned 
military and commercial datalinks: 

1. Either a simplified Link 16 protocol (e.g., the surgical strike datalink) channel 
relayed by aircraft or a commercial Low Earth Orbit data communication service (e.g., 
Iridium) would be suitable for full motion. 

2. Satellite communications (SATCOM) would be acceptable only for still frame 
transmission. 

3. A true Link 16 implementation is too complex. 

Because there is no universal metric for the utility of a given data compression 
algorithm when embedded in a communication system, compression algorithms will be 
designed within the context of the recommended BDI system. To achieve the very high 
compression ratios required and yet maximize video throughput and quality, we have 
implemented the China Lake developed Robust Embedded Zerotree Wavelet (REZW) 
image compression algorithm on a Texas Instruments (TI) 320C80 multivideo processor. 
The complete algorithm can precisely achieve any fixed bit rate (i.e., compression ratio) 
while still delivering state-of-the-art rate-distortion performance (i.e., low distortion at a 
specified bit rate) or, conversely, achieving any fixed distortion level (up to lossless) over 
an entire image or just over specific regions. With a single TI 320C80 processor 
operating at a clock rate of 40 megahertz and a compression ratio of 80:1, our coder 
maintains a frame rate (with 512x240 frames) of approximately 7 frames per second 
(fps). At compression ratios of greater than 80:1, the throughput is higher (approximately 
15 fps at 200:1) and below 80:1 it is lower (approximately 3 fps at 20:1). However, 
because the algorithm is highly scalable, its speed increases linearly with the amount of 
processing power used. For example, we have also implemented the algorithm on a 
system having two TI 320C80 processors operating at 60 megahertz and have achieved a 
throughput of 20 fps at a compression ratio of 80:1. 

One of our major milestones was the fiscal year 1998 demonstration of our 
intraframe encoder and decoder operating over an actual communications channel. While 
a military packet radio channel (e.g.. Joint Tactical Information Distribution System 
(JTIDS) or Multifunctional Information Distribution System (MIDS)) would have been 
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ideal for this purpose, we did not have access to such equipment at the time nor did we 
have the financial resources to acquire it. A video broadcast over the internet provides a 
test bed (1) to study the impact of data packetization on REZW bit stream robustness, (2) 
to illustrate systems tradeoffs including variable bandwidth, (3) to coordinate battle 
damage assessment requirements with the BDI rapid retargeting recommendations, and 
(4) to provide asynchronous transfer mode (ATM) technology directly relevant to Navy 
needs. 

The major challenge in fiscal year 1999 has been to preserve the error resilience of 
the REZW algorithm in the presence of the temporal differencing required to implement 
our low complexity interframe coding techniques in real time. We have solved this 
problem by using a “leaky” prediction model, thereby allowing the decoder to gradually 
forget errors. The resulting real-time video encoder provides at least twice as much 
resolution for a given compression ratio while only slightly slowing down the encoder (a 
1.5 fps drop at an 80:1 ratio). And even at very high error rates, we have found that the 
combination of REZW spatial compression and leaky temporal prediction gives excellent 
error resilience. 

In the following chapters, we focus primarily on the technological and scientific 
contributions that have resulted from this project, along with the real-time demonstration 
software implementing these concepts. Prior to doing this, however, we set the stage for 
these discussions by describing a JSOW-compatible BDI imaging platform that was 
developed and simulated early in this project. Following this, Chapter 3 introduces a 
variety of image compression algorithms, characterizing their strengths and weaknesses 
for remote sensing applications like BDI and weapons control. Chapters 4 through 6 
discuss our REZW compression algorithm, parallelization of embedded coders for 
increased speed, and reduced-complexity motion compensated video compression. 
Chapters 7 and 8 discuss our real-time TI 320C80 implementation and describe the use of 
this software in detail. Chapter 9 presents our conclusions and future research directions. 
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CHAPTER 2. 

IMAGING PLATFORM 


Early in this project, we investigated a number of ways in which an imaging BDI 
platform could be incorporated into an existing weapons system. For a variety of reasons, 
we quickly discarded tethered vehicles and balutes (a cross between a balloon and a 
parachute). A tethered vehicle did not allow sufficient time to transmit the acquired 
imagery, while the balute could not be deployed in such a way the it could view the 
weapon’s impact. Instead, we focused on a small glider having collapsible wings, which 
would replace two submunitions in a JSOW. Unfortunately, aerodynamic analysis 
showed that this glider was only marginally stable, but through further analysis we 
determined that flight stability could be increased by adding a small electric motor to the 
trail of the platform. This JSOW-compatible BDI imaging platform is shown in 
Figures 2-1 and 2-2. 

The vehicle was designed and analyzed aerodynamically using a higher order panel 
method, PANAIR. This method does not include viscous or separation effects. VSAERO, 
a low-order panel method, includes an integral boundary layer analysis that provides 
viscous drag. This value was added to the inviscid drag value obtained from PANAIR. 
However, this does not give an adequate representation of the viscous flow field about 
the configuration. Because of its small size and low speed, the vehicle will experience 
very low Reynolds numbers. Because low Reynolds number aerodynamic properties are 
very sensitive to geometry and viscous conditions, CFD studies were conducted to aid in 
their verification. These studies solved the viscous Navier-Stokes equations to model 
boundary layer growth and separation over the surface of the vehicle, particularly on the 
boattail and lifting surfaces. These studies considered the effects of the propeller by 
modeling it as an actuator disk; the actual propeller blades and rotation were not 
modeled. However, modeling the propeller as an actuator disk was sufficient to 
understand the effects of the propeller on the vehicle. Appendix A contains the data 
generated by these simulations. 
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(b) Front view. 


FIGURE 2-1. Proposed BDI Platform. Measurements are in inches. 
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(b) Isometric view with actuator disk. 

FIGURE 2-2. Additional Views of the Proposed BDI Platform. 
Measurements are in inches. 
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The lateral aerodynamics of this vehicle could be analyzed using the less detailed 
PAN AIR because solutions are only required with very small sideslip angles. Tailed 
vehicles, which are laterally stable at small angles of sideslip, will tend to stay laterally 
stable at larger angles. Thus the sideslip angles necessary to analyze the lateral stability 
characteristics are on the order of 1 to 3 degrees. If it were considered necessary to 
analyze the vehicle at larger angles of sideslip or large angles of rudder deflection, it 
would be necessary to use a computational fluid dynamics (CFD) analysis. However, this 
is not the case. Figures 2-1 and 2-2 illustrate the tail configuration that was determined to 
give the best flight characteristics. Note that the vehicle has a vertical surface both above 
and below the body. Because of the wing/canard configuration, the vehicle’s center of 
gravity (CG) is considerably farther aft than that of a typical flight vehicle. Thus the 
moment arm is considerably shorter for the vertical tail, which necessitates a greater 
vertical tail surface area. In addition, because this vehicle is rudder controlled (top 
vertical tail only), rudder deflections must induce a roll into the turn. This necessitated 
that the top vertical tail be larger than the bottom and that the wings contain dihedral. 
This combination affects the flight such that if the rudder deflects the nose to the left, the 
wings will also bank to the left, initiating a smoothly controlled left turn. Figures 2-3 and 
2-4 illustrate the yawing and rolling moments for this configuration. The data show a 
positive slope for the yawing moment and a negative slope for the rolling moment, which 
indicate a laterally stable configuration. 

The CFD analysis provided both an improved estimate of the basic aerodynamic 
forces and moments and also a detailed understanding of the flow field about the full 
configuration. To significantly lower the cost of the analysis, the vertical fins and the left 
half of the vehicle were not modeled. This significantly lowered the grid generation time 
and the computer run times, which drastically reduced the cost of the analysis. As a result 
of the simplified modeling, only longitudinal aerodynamics could be analyzed. But this 
was sufficient because lateral aerodynamics can be predicted adequately using PANAIR, 
which is considerably less costly. Force and moment results of the PANAIR and CFD 
analyses are presented in Figures 2-5 through 2-8. 

Based on this and other analysis, the proposed image platform appears to be both 
stable and controllable while delivering sufficient loitering time above the target area. 
However to be completely sure, one must also perform wind tunnel tests to validate the 
simulation results. Because of budget constraints, we were not able to perform such tests. 
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FIGURE 2-5. Lift Coefficient Comparison of the Navier-Stokes Solutions With and 
Without the Actuator Disk at Sea Level, With Turbulent Boundary Layer. 
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FIGURE 2-6. Drag Coefficient Comparison of the Navier-Stokes Solutions With 
and Without the Actuator Disk at Sea Level, With Turbulent Boundary Layer. 
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FIGURE 2-7. Pitching Moment Coefficient Comparison of the Navier-Stokes 
Solutions With and Without the Actuator Disk at Sea Level, With Turbulent 
Boundary Layer. 
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FIGURE 2-8. Lift-to-Drag Coefficient Comparison of the Navier-Stokes Solutions 
With and Without the Actuator Disk at Sea Level, With Turbulent Boundary Layer. 
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CHAPTER 3. 

IMAGE AND VIDEO COMPRESSION FOR REMOTE SENSING 

SUMMARY 

The goal of this chapter is to familiarize the reader with the major image and video 
compression technologies in the context of weapons control (WC) and BDI applications. 
Toward this goal, we discuss the operation and limitations of the four major still image 
compression techniques* discrete cosine transform (DCT), wavelet transform (WT), 
vector quantization, and fractal. We then consider the broader problem of implementing 
video compression algorithms, which use the previously discussed still image 
compression algorithms as basic building blocks. Some of the techniques examined here 
are motion estimation, motion compensation for prediction and interpolation, and three- 
dimensional (3-D) subband coding. Finally, we discuss the major characteristics of digital 
video compression and transmission systems, concentrating on the impact they have on 
WC and BDI system performance. Many of these characteristics are desirable in every 
compression application (e.g., good image quality) but some of them are especially 
important for low-cost, remote-sensing platforms (e.g., a low-complexity video encoder). 
In addition, some characteristics such as video latency are important for the WC 
application but not for the BDI application. Within this framework, we discuss the 
various image and video compression technologies available and highlight those that are 
particularly good or bad for WC/BDI applications. 


INTRODUCTION 


Recent advances in computational hardware and signal processing theory have made 
the transmission of real-time digital video possible. Thus far, there have been two driving 
applications in this field: (1) video broadcast via such media as CD-ROM (compact disk 
- read only memory), terrestrial radio frequencies (RF), and satellite RF; and (2) video 
teleconferencing. The key paradigm for video broadcasting, which is epitomized by the 
MPEG standards (References 3-1 and 3-2), is to make the receiver as cheap as possible 
because there are many receivers but only a few transmitters. For video teleconferencing, 
on the other hand, the goal is to make the system as symmetric as possible because every 
system must be able to both transmit and receive video in real time. In addition, broadcast 
video is generally expected to be of very high quality (motion picture experts group 
(MPEG)) 1 video should look at least as good as that of a standard VCR (video cassette 
recorder), for example), while a considerable amount of degradation is accepted in 
teleconferencing video. 
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The paradigms for military remote-sensing video are considerably different from 
those of either broadcast or teleconferencing video. First and foremost, the transmitter 
and, consequently, the video encoder, must be as low in complexity as possible, while the 
decoder complexity is far less important. Reducing the complexity of the encoding 
translates into reduced space, weight, and power requirements on the sensor platform. At 
the very least, relaxing these requirements reduces the cost of the complete system, but it 
can also be the enabling factor that makes the system possible in the first place. The issue 
of transmitter complexity and its associated costs becomes even more important for 
WC/BDI applications, where the transmitter is expendable. Currently, analog video 
datalinks derived from the National TV System Committee (NTSC) television broadcast 
standard are used, but these suffer from a variety of shortcomings including low 
resolution, jamming/intercept vulnerability, and a lack of flexibility to varying video data 
types (e.g., laser radar (Ladar), synthetic aperture radar (SAR), imaging infrared (HR), 
etc.). Digital video transmission can overcome all of these shortcomings, albeit at a 
higher transmitter cost. 

In this paper, we study a number of compression technologies and evaluate their 
suitability for WC/BDI applications. Below we introduce the notation and terminology 
used throughout this paper, and then discuss the four most common still frame 
compression technologies. Still frame compression is studied both for its own sake and 
because it forms the basis of the video compression strategies also discussed in this 
chapter. We then analyze the ability of the various algorithms presented in the earlier 
sections to satisfy the WC/BDI compression requirements. Finally, conclusions are 
presented. 


NOTATION AND TERMINOLOGY 


Throughout this technical paper, the terms compression and coding are synonymous 
and will be used interchangeably. We use the word coding to describe the compression 
process because an actual bit stream describing the image is output by the compression 
algorithm. Historically, this process is referred to as source coding , a term which 
originates from Shannon’s original work on information theory (Reference 3-3). 
Consistent with this terminology, we refer to the part of the algorithm that performs 
compression as the encoder and the part that performs decompression as the decoder. 

In general, there are two forms of coding, lossless and lossy. As the names suggest, 
lossless coding implies that the image produced by the decoder is identical to that which 
went into the encoder. Unfortunately, the compression ratio that one can achieve using 
lossless coding is image dependent and is generally not more than 4:1 (4 times fewer bits 
in the compressed image than were in the original). Lossy coding, on the other hand, 
implies a loss of information in the coding process, leading to a degradation in the quality 
of the decoded image. To accurately characterize the quality of a lossy coder, one must 
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specify a rate-distortion curve (i.e., the bit rate (compression ratio) of the coded image 
.versus the distortion introduced by the coding process). This distortion is typically 
measured by either mean squared error (MSE) defined as 

MSEh }TY^ £|x(m,n)-x(m,n)| 2 

1 m=0 n=0 (3-1) 

or peak signal to noise ratio (PSNR), which is generally given in decibels and is defined 
by 


PSNR.20.log, (32) 

In Equation 3-1, the original and coded images are given by x(m,n) and x(m,n), 
respectively, and both are X by Y pixels in size. Thus, in later sections when we discuss 
the rate-distortion performance of a coding algorithm, we are simply referring to the 
average distortion introduced into the output image, as measured by Equations 3-1 and 
3-2, compared to the number of bits used to represent it in coded form. 


IMAGE COMPRESSION 


OVERVIEW 

The objective of still image compression or coding is to reduce the number of bits of 
information needed to represent a given image by eliminating redundancy in the image 
(lossless compression) and by introducing distortion into the image in a manner that is 
acceptable to the viewer (Reference 3-4). While lossless compression is desirable, the 
amount of bit rate reduction it can achieve depends on the entropy or information content 
of the image. Allowing distortion to be added to the image by the compression process 
results in a loss of information, but it also makes it possible to achieve arbitrarily high 
compression ratios at the cost of accepting large amounts of distortion. Thus any lossy 
image compression algorithm must be characterized by a rate versus distortion curve in 
order to properly quantify its quality. 

In recent years, four major coding techniques have dominated the engineering 
literature, and all of these have become commercially available in some form. The four 
major approaches are DCT, WT, vector quantization (VQ), and fractal. Some of these 
techniques can also be combined to create a hybrid coding algorithm; the most common 
approach combines a transform with VQ. In this work, we give only a brief overview of 
each of these methods, focusing on characteristics that affect their utility in WC/BDI 
applications. 
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DISCRETE COSINE TRANSFORM 


Coders based on the DCT are by far the most pervasive today. Included in this group 
are the MPEG 1 and 2 standards for broadcast video (References 3-1 and 3-2), the CCITT 
H.261 standard for video teleconferencing (Reference 3-5), and the joint photographies 
experts group (JPEG) standard for still image compression (Reference 3-6). For these 
coding algorithms, a separable 2-D DCT is performed on each 8x8 block of image pixels. 
Specifically, the forward transform of the DCT is defined by the JPEG standard as 



(3-3) 


where f(j,k) is an 8x8 block of image pixels and 


C(x) 


72 ifx=0 

1 otherwise 


(3-4) 


The inverse transform is similarly defined with the roles of (j,k) interchanged with 
those of (u,v). Using these relatively small blocks reduces the amount of computation 
required and preserves a reasonable amount of spatial locality in the transform domain. 
This second point is important because the underlying statistics of a typical image are 
highly nonstationary and therefore, to maximize coding efficiency, bits must be allocated 
spatially across the image in a nonuniform manner. In addition, because non-overlapping 
blocks of the image are transformed, the entire process is easily parallelized. However the 
use of non-overlapping blocks is also the Achilles heel of such algorithms at low bit rates 
because block boundaries begin to appear in the reconstructed image. These artifacts 
begin to emerge in JPEG-coded images as the compression ratio is increased above 20:1. 


Figure 3-1 is a block diagram of the complete transform-based image encoder in its 
simplest form. In the figure, each block of the image is first converted into its frequency 
components by the transform (defined by Equation 3-3) for the DCT). After 
transformation, the blocks of coefficients are quantized, with varying numbers of bits 
allocated to different frequency components. The distribution of bits among the 
frequency components can be either fixed for all time or can be adapted and transmitted 
as side information; this second approach generally provides superior performance. After 
quantization, the coefficients are losslessly encoded either by using run-length and 
Huffman coding or by using an arithmetic coder. This lossless coding will eliminate any 
remaining statistical redundancy in the image; thus if the corruption of 1 bit during 
transmission causes the reconstructed image to degrade into white noise, the encoder has 
done an excellent job. It should be noted that the use of lossless coding makes this 
algorithm inherently variable in bit rate. Thus getting the compressed image through a 
fixed bit rate channel requires either that we iterate the coding process (changing a 
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quality factor until the desired bit rate is achieved) or that we use a large buffer with 
some form of feedback control from the quantizer. This later approach is the one taken in 
the MPEG standards. 


Image 



Compressed 
Bit Stream 


FIGURE 3-1. Image Encoder Block Diagram. 


In the WC/BDI problem, DCT-based methods have much to offer. First and 
foremost, the complexity of a still-frame DCT-based encoder is very low because the 
complexity of the 8x8 DCT is very low. In addition, the algorithm scales linearly with the 
dimensions of the image, and it is highly parallelizable. Finally, a great deal of available 
commercial hardware can be easily leveraged for military applications. The only real 
drawback of a DCT-based coding algorithm is its performance at low bit rates: the 
boundaries between the 8x8 blocks appear in the reconstructed image. Consequently, for 
low bit rate transform coding, transforms that maintain spatial locality and yet have 
overlapped blocks are preferable. Such transforms are often called modified lapped 
transforms (MLTs) or multirate filter banks (Reference 3-7), and certain members of this 
general class can be used to efficiently implement discrete wavelet and wavelet-packet 
decompositions. 


WAVELET AND WAVELET-PACKET TRANSFORMS 

Wavelet and wavelet-packet transforms are two members of the general class of 
subband coding methods. While other forms of subband coding have been discussed in 
the literature at great length (Reference 3-8), a general consensus has formed over the 
past few years that wavelet and wavelet-packet techniques offer the best in coding 
performance. 

Figure 3-2 is the block diagram of one level of a 2-D wavelet transform, where L and 
H indicate lowpass and highpass filters, respectively, and the subscripts v and h indicate 
the direction in which the filter operates (vertical or horizontal). The downward-pointing 
arrow followed by the 2 represents the downsampling operation in which every other 
sample point is discarded (this maintains the same sampling density in the wavelet 
domain as existed in the original image). To get a true wavelet transformation, the 
decomposition of Figure 3-2 is iteratively applied, first to the original image and then to 
the low-low band at each successive level until, in the limiting case, there is only one 
pixel remaining in the final low-low band. Figure 3-3 shows a mapping of wavelet 
coefficients for a 3-level wavelet decomposition, where the two capital letters indicate the 
frequency band (e.g., HL corresponds to a high-low band) and the subscript indicates the 
scale of that band. In this figure, lower scale numbers correspond to finer edge features in 
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the image, while higher numbers correspond to subsampled coarse features. Note that all 
of the bands represent different resolutions of edge features except the final low-low 
band, which contains all of the remaining low frequency image content. The wavelet 
coefficient mapping illustrated in Figure 3-3 can be converted perfectly back into the 
original image if the correct filters L and H are used in Figure 3-2. The conditions on 
these filters were first presented in Reference 3-9 and later used by Daubechies in her 
compactly supported (i.e., having a finite impulse response filter bank implementation) 
orthogonal wavelets (Reference 3-10). In order to show the equivalence between this 
discrete wavelet transform (DWT) and the continuous wavelet transform, Daubechies 
required her filters to have an additional quality called regularity, which is somewhat 
related to the number of zeros that filter L has in the frequency domain at n. Relaxing the 
orthogonality constraint, perfect reconstruction biorthogonal wavelets and filter banks 
have also been created (Reference 3-11), and these have been shown in recent years to be 
the most effective for image compression applications (Reference 3-12). 




low-low band 


0 

H 

^ 2 

0 

H 

i 2 


Ub low-high band 
high-low band 



high-high band 


FIGURE 3-2. Four-Band, 2-D Wavelet Decomposition. 
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Early wavelet compression algorithms followed the same format as the DCT-based 
coder illustrated in Figure 3-1: transform, quantize, and losslessly encode 
(Reference 3-13). This methodology generated better results with a wavelet transform 
than with a DCT, but the difference was only on the order of a few decibels in PSNR 
improvement. It was not until the advent of the Embedded Zerotree Wavelet (EZW) 
algorithm developed by Shapiro (Reference 3-14) that a vast improvement in still frame 
compression quality was realized over DCT-based methods like JPEG. The most 
important concept introduced in this algorithm is that of forming zerotrees of wavelet 
coefficients. This idea makes it possible to efficiently exploit the correlation between 
insignificant coefficients at different scales. This basic concept has since been further 
optimized using an entropy-constrained quantization approach to achieve some of the 
best wavelet coding results published to date (Reference 3-15). 

A wavelet-packet decomposition uses the basic filtering process described by 
Figure 3-2, but instead of successively subdividing the low-low band, it adapts the 
decomposition to the image using some criterion. The first work in this field used entropy 
to determine whether a given band was further subdivided (Reference 3-16), while later 
work optimized the decomposition in a rate-distortion sense (Reference 3-17). These 
methods are very computationally expensive compared to a standard wavelet transform 
because of the added need to adaptive the decomposition, but when combined with the 
zerotree concept, they deliver excellent rate-distortion performance on difficult images 
(Reference 3-18). 

In terms of the objective PSNR (or equivalently MSE) distortion measure, wavelet- 
based coding algorithms outperform DCT-based algorithms at all bit rates. This 
difference, however, only becomes visually perceptible at lower bit rates and even this is 
mostly because of the serious blocking artifacts that appear in the DCT-coded image. 
Thus for higher bit rate applications (e.g., less than 20:1 compression ratios), there is little 
perceptible advantage to using a wavelet-based coder versus a DCT-based coder, but 
there is a considerable increase in computational complexity. The computational 
complexity of a DWT for an NxN image is kN 2 , where the constant k depends on the 
exact filters used in Figure 3-2 and the number of levels of the wavelet decomposition. 
For the very low complexity 5/3 biorthogonal wavelet (Reference 3-11) with 5 levels of 
decomposition, k is 30; while for the allpass wavelet (the most computationally efficient 
wavelet having good filter quality) it is 7 (Reference 3-19). The complexity of the block- 
based DCT, on the other hand, is also kN 2 but with k less than 1, giving it a considerable 
computational advantage. At very low bit rates, however, the perceptual quality of a 
wavelet-coded image is far better, than that of a block-based DCT. In such situations, the 
image coded with the DWT is generally blurred, suffering from ringing or blocking 
artifacts at sharp edges within the image. The exact nature and extent of these artifacts 
depend on the lengths and smoothness of the wavelet filters used in the decomposition. 
For shorter biorthogonal wavelets, such distortions are not severe, and reasonable image 
quality can often be attained down to compression ratios of 100:1. 
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VECTOR QUANTIZATION METHODS 

A Vector Quantizer groups L input samples (in this case pixels) together into an 
input vector x and measures the Euclidean distance, 

d = £(x(k)-Ci(k )) 2 

k=o (3-3) 


between that vector and a set of codevectors, c], in what is called the codebook. The 
index i of the codevector with the smallest Euclidean distance (i.e., the one most similar 
to the input vector) is then transmitted to the receiver, which then looks up this index in 
its own codebook to find an approximation to the original input vector. The encoder and 
decoder for this process are illustrated in Figure 3-4. Note that the encoder and decoder 
must have access to exactly the same codebooks and that the quality of this method is 
directly dependent on the quality of the codebook. 


Encoder Decoder 



Image 



FIGURE 3-4. Vector Quantization-Based Encoder and Decoder. 

The motivation for this technique comes from an observation in Shannon’s landmark 
paper (Reference 3-3) that by allowing the size of the coded vector to grow to infinity, 
the efficiency of the coder could achieve the theoretical bound (i.e., the vector could be 
coded with the absolute minimum number of bits). Unfortunately, infinite length vectors 
are difficult to work with in practice, and this theoretical observation provides no clues as 
to how one might actually design the vector quantizer. An algorithm called the 
generalized Lloyd or LBG (Linda, Buzo, and Gray) has been developed to design VQ 
codebooks, but it is only guaranteed to converge to a locally optimal solution 
(Reference 3-20). As the dimension of the input vector and the size of the codebook 
increase, it becomes more and more difficult to find good solutions using this algorithm. 
This problem becomes especially obvious when one directly applies VQ to image 
compression because the input vectors are reasonably large (typically 4x4 or 8x8 groups 
of pixels) and the codebooks are very large (many thousands of codevectors). Even 
worse, however, is the huge variety of possible input sequences. The VQ codebook 
design process tries to distill out of a sequence of training vectors the essential “truths” of 
that sequence (i.e., the fundamental feature blocks that best describe a set of images). 
Unfortunately, the space of all possible image sequences is simply too broad for such a 
method to be effective. On smaller, narrowly defined image sets, though, such techniques 
may prove to be very useful. 
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In order to overcome some of the limitations of this unstructured VQ approach, 
many researchers have studied ways to add structure to the VQ coder. Some approaches 
that have been found to be useful for image compression include mean-gain-shape VQ 
(Reference 3-21), transform VQ (References 3-22 and 3-23), product code VQ 
(Reference 3-24), and multiresolutional VQ (Reference 3-25). All these techniques allow 
a priori knowledge about the input source to be included into the coding model, 
simplifying and robustifying the design. In addition, if the VQ is forced to have a binary, 
tree-structured codebook, the search complexity for a codebook of size M can be reduced 
from M Euclidean distance comparisons to just log 2 M. The decoder operation is simply a 
table lookup operation, so the encoder search represents all of the complexity in a 
conventional VQ implementation. From the perspective of the WC/BDI problem, it is 
unfortunate that most of the complexity resides in the expendable encoder. If vectors of 
size 4x4 are used, each Euclidean distance calculation requires 32 operations, making the 
total complexity 2 • M • N 2 operations for an image of size NxN. While this is quite high 
(M is generally greater than 1,000), the structure of this search process is very regular 
and, consequently, well suited to very large-scaled integrated (VLSI) implementations. 

Despite their theoretical advantages, the performance of most VQ coder algorithms 
on images taken from outside their training sets lags that of transform-based coders. It 
should be noted, however, that vector quantization is actually a superset of all of the other 
image coding techniques discussed in this paper. For example, a coder with identical rate- 
distortion performance to the JPEG coder can be implemented using a full search VQ bn 
8x8 blocks of image pixels, with the DCT, quantization, and lossless compression 
implicitly incorporated into the VQ codebook. Even fractal coding, which is discussed 
below, can be implemented in a VQ framework. Of course, it is generally more efficient 
to implement transform and fractal coders explicitly than it is to use a VQ formulation. 


FRACTAL METHODS 

Fractal coding methods are based on the assumption that natural images have fractal 
characteristics (i.e., that the same basic features appear in different spatial locations and 
at different scales throughout the image). While a fractal coder also uses transformations, 
it is far different than those discussed above because the transforms themselves and not 
the transform coefficients are what is transmitted to the decoder. A fractal-based 
encoding algorithm is shown in Figure 3-5. Basically, a number of mappings must be 
devised that map all of the blocks of the image to other, smaller blocks of the image. 
Going from large to small blocks results in a contractive mapping, which is a necessary 
condition to ensure convergence in the decoder when the mapping is iterated. 
Overlapping is allowed in both the domain (original) blocks and the range (translated and 
contracted) blocks; the Collage Theorem is used to smoothly fuse blocks together. If the 
mappings are designed properly, an approximation of the original image can be 
reconstructed in the decoder by iteratively implementing these mappings (starting from 
any arbitrary image) until convergence is achieved. An excellent, in-depth treatment of 
fractal coder design is presented in (Reference 3-26). 
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FIGURE 3-5. Block Mappings 
for an Iterative Fractal Encoder. 

While fractal coding algorithms are among the most mathematically elegant, they do 
have their drawbacks. The first and foremost is that their encoder complexities are 
extremely high. By contrast, the complexity of a fractal decoder is almost trivial, but, 
unfortunately, this asymmetry is exactly the opposite of what is needed for still frame 
compression in WC/BDI applications. In addition, the rate-distortion performance of 
fractal coders has been found to be generally inferior to that of DCT- and wavelet-based 
coders. Some impressive claims of good image quality at compression ratios of 1000:1 
have been made, but these have not been substantiated in comparative tests on large, 
diverse sets of natural images. 


VIDEO COMPRESSION 


OVERVIEW 

In video coding algorithms, both interframe and intraframe redundancies are 
removed from an image sequence. To remove intraframe redundancy, one of the still 
image compression algorithms discussed above is often used. The most common video 
coding schemes use block-based motion-compensated prediction to create a residual 
image that has very low energy and can, therefore, be coded with very few bits. A generic 
encoder is shown in Figure 3-6 and the corresponding decoder is shown in Figure 3-7. 
Most of the video coding algorithms using motion compensation fit into this framework, 
including MPEG 1 and 2. Comparing the figures, one notes that the video encoder 
actually contains a complete copy of the still frame decoder in its feedback loop. 
Excluding the still frame encoder and decoder, the two most complex blocks in 
Figure 3-6 are the motion estimation and rate buffer blocks. By effectively estimating the 
motion and compensating for it in the feedback loop, a residual image with little energy is 
formed; this can be coded by the still frame encoder with very few bits. As a general rule, 
however, the better the motion estimate is, the more complex the estimator must be. The 
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rate buffer, on the other hand, does not require a lot of computational processing power to 
implement: it is complex because it must dynamically adjust the quantization in the still 
frame encoder so that a constant bit rate can be output without buffer over- or 
underflows. Performing this operation well requires large amounts of hand optimization 
in the design process and a lot of logic circuitry in the buffer controller. A number of 
possible rate buffer implementations have been proposed (References 3-27, 3-28, and 
3-29). 



FIGURE 3-6. Generic Hybrid DPCM-Transform Video Encoder. 


Bits 

In 
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FIGURE 3-7. Generic Hybrid DPCM-Transform Video Decoder. 
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VIDEO ENCODING AND DECODING 

The video encoder described by Figure 3-6 can produce three different types of 
video frames: I-frames, B-frames, and P-frames. A typical mix of such frames is shown 
in Figure 3-8. An I-frame is coded using only information from within that frame (i.e., 
intraframe coded), and, consequently, allows a clean break-in or resynchronization point 
within the video sequence. It is very important to have these break-in points to allow the 
receiver to resynchronize in the event of an uncorrected transmission bit error. The 
density of 1-frames determines how many seconds of video are corrupted in the event of a 
transmission error. A P-frame, on the other hand, is the prediction residual from the last 
P- or I-frame in the sequence: it has no dependence on future frames. Obviously, P- 
frames do not offer break-in points, and errors from past frames will propagate into them. 
The last standard frame is a B-frame, which is bidirectionally predicted from both past 
and future frames. The highest compression is achieved with B-frames, but they cannot 
be used to predict future frames because they, themselves, depend on future frames. 
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FIGURE 3-8. Typical Mix of Coded Frames. 1-frames, 
P-frames, and B-frames are intra, unidirectionally- 
predicted, and bidirectionally-predicted frames, 
respectively. 


By comparison, the operation of the video decoder as shown in Figure 3-7 is quite 
simple, with its most complex element being the still frame decoder. A frame reordering 
block is needed only if bidirectionally predicted frames are used and, while it increases 
the memory requirements of the decoder, it has no impact on its computational 
complexity. Note that the decoder shown in Figure 3-7 does not include motion- 
compensated interpolation. Using such interpolation greatly increases the computational 
complexity of the decoder with no cost increase in the encoder, making it well suited to 
the remote-sensing paradigm. 
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3-D SUBBAND CODING 

Another video compression technology showing much promise in recent years is 
motion-compensated 3-D subband coding. A 3-D subband decomposition transforms the 
sequence not only in the two spatial dimensions but also in the temporal dimension, 
creating a time-frequency-space mapping of the sequence. In theory, this decomposition 
alone should be able to capture and localize motion information, but recent work has 
found that the addition of global motion compensation greatly improves performance 
(References 3-30 and 3-31). The video encoding and decoding algorithms shown in 
Figures 3-6 and 3-7 also describe the basic structure of a motion-compensated 3-D 
subband coder, where the “image encoder” and “image decoder” blocks now process sets 
of images. Obviously, this forces other blocks in the system, most notably the motion 
compensation block, to process sets of image frames as well. 


MOTION-COMPENSATED INTERPOLATION 

An important technique for improving the perceptual quality of compressed image 
sequences is bidirectionally motion-compensated interpolation, illustrated in Figure 3-9. 
The past and future frames are reconstructed, and then these are used along with motion 
information to interpolate the missing frames. In Figure 3-9, for example, the interpolated 
frame I, is a linear combination of four-fifths of the past frame and one-fifth of the future 
frame, where each of these have been motion compensated before being combined. To 
perform the motion compensation, the past frame is moved forward along the motion 
vector(s), while the future frame is moved backward along it (them). If multiple vectors 
are used in a block-based compensation scheme, overlapping and non-covered areas in 
the interpolated images must be considered. Bidirectional interpolation does not truly add 
any new information to the image sequence, but it does increase its perceived quality by 
maintaining a sharp, smooth blending of true image frames, simulating a far higher frame 
rate than was actually transmitted. Note that bidirectional interpolation is a part of the 
bidirectional predictive coding used in Figures 3-6 and 3-7: the bidirectionally 
interpolated image forms the estimate, which is subtracted from the original image to 
create the differential update (B-frame) in the encoder. The MPEG coding algorithm also 
allows for the use of motion-compensated interpolation in the decoder (Reference 3-1). 
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FIGURE 3-9. Motion Compensated Interpolation. 
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MOTION ESTIMATION 

The motion estimation process is truly the heart of video compression in that it has a 
very large effect on the efficiency of the system and on the quality of the resulting output. 
The most common method currently used is block-based motion estimation, where blocks 
of pixels from the past frame are compared with blocks in the current frame in order to 
determine where they moved. Using these vectors, one can then compensate the past 
frame in Figure 3-6 for the motion (e.g., move the blocks to their new locations) before 
subtracting it from the current frame (Reference 3-32). The most common way to 
determine the movement of the blocks is to use full-search block matching, in which a 
spatial-domain correlation is performed between a block in the previous frame and a 
region around that block in the current frame. This is computationally expensive and it 
often results in poor motion estimates when displacements of less than one pixel are 
involved. To overcome this second problem, one can interpolate the blocks to achieve 
subpixel accuracy, but this further increases the computational complexity. Faster spatial 
correlation methods such as the three-point search have been developed to reduce the 
computational complexity (Reference 3-4), but these only converge to the optimal motion 
vector if the error surface is convex (which is not often the case). 

Another class of motion vector estimation methods are those based on frequency 
domain correlations. For example, block matching can also be done by taking the fast 
Fourier transforms (FFTs) of the past and current blocks, multiplying them, and then 
taking the inverse FFT. This has the advantage over spatial domain correlation in that 
subpixel accuracy can be achieved simply by zero padding in the transform domain. 
Another approach is to perform the frequency domain correlation over the entire frame 
and then do block matching at the correlation spikes to determine which blocks moved 
where. This has proven to be especially effective for high quality interpolation using a 
variant frequency-domain method called phase correlation (References 3-33, 3-34, and 
3-35). Finally, we note that, thus far, pixel based methods like optical flow 
(References 3-36, 3-37, and 3-38) have not been widely applied in image coding because 
of the high cost associated with coding the flow vectors. 


WC/BDI TRADEOFFS IN VIDEO CODERS DESIGN 


The most important features in a video coding algorithm designed for WC and BDI 
applications are low encoding complexity, low latency, acceptable quality, flexible data 
type, and robustness to channel errors. All of these can be simultaneously achieved as 
long as one does not also constrain the bit rate. Of course, all realistic channels have 
bandwidth constraints, so perfectly satisfying all of these constraints in the same system 
is impossible. Instead, one must study the trade-offs associated with different design 
decisions. 
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ENCODING COMPLEXITY 


Still Frame Encoding 

The conventional video encoder shown in Figure 3-6 contains a still frame encoder 
and decoder. In the case of a DCT- or wavelet-based algorithm, the inclusion of this 
feedback loop doubles the complexity of the video encoder. Using instead a non- 
symmetric still frame scheme like VQ or fractal coding greatly reduces the cost of adding 
the decoder to the feedback loop but greatly increases the basic cost of the encoder, 
leading to a net increase in computational complexity over a transform-based system. 
Given this, along with our earlier discussion about still-frame coding algorithms, we 
conclude that the best choice is a transform coder, although which one depends very 
much on the bit rate at which the system is designed to operate. Because both the forward 
and inverse transforms must be performed, the computational advantage of the block- 
based DCT over the wavelet transform becomes even more pronounced. On the other 
hand, if too few bits are allocated to code each I-frame, severe artifacts are introduced 
into the sequence by the transform, and these will only be exacerbated in the 
differentially coded frames. Consequently, despite its computational complexity, 
wavelet-based coders might be preferable at lower bit rates. 


Video Encoding 

As pointed out previously, the feedback loop in Figure 3-6 adds a great deal of 
computational complexity to the system. If subpixel interpolation is used in the motion- 
compensation block, the encoder complexity is increased even more. If the motion 
compensation can instead be performed in the transform domain, the still frame decoder 
block can be eliminated, greatly reducing the complexity. While this has been recently 
proposed (Reference 3-39), the idea is not yet proven. Better still would be the 
elimination of the entire feedback loop, but this would make it much more difficult to 
exploit temporal redundancy in the sequence. 


LOW LATENCY 

The need for low latency is most obvious in the weapons control problem: if the 
delay between when the image was acquired and when it is viewed by the pilot becomes 
too great, then by the time the pilot’s control input gets to the weapon it might be too late 
to update the weapon’s course. The latency issue is simplified, however, because the pilot 
need not directly control the weapon: he might simply perform the final target selection. 
Thus a few seconds of latency in the video may be acceptable, as long as a frame motion 
history is maintained in the weapon (i.e., the weapon knows where a spot designated in a 
past frame has moved to in the current frame). The issue of exactly how much latency 
can be tolerated for weapons control then depends largely on the speed and 
maneuverability of the weapons as well as the speed of the target. Because man-in-the- 
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loop control is generally only used for ground targets and with relatively slow weapon 
platforms, it is likely that considerable delay can be tolerated. 

For the BDI problem, the requirements for video latency are less severe, but they 
still exist. Consider, for example, an encoder that introduces a latency of 2 seconds into 
the video stream. With such an encoder we will lose, on the average, the last second of 
video, which if the BDI sensor is mounted on the weapon itself, could be critical 
information. Thus latency must also be considered in this application. 


ACCEPTABLE QUALITY 

While objective measures such as MSE (or, equivalently PSNR) roughly 
characterize image quality, it is well known that they have many flaws. It is for this 
reason that human perceptual evaluations have formed the cornerstone of high definition 
TV (HDTV) evaluation. The same holds true for WC/BDI applications, but with a twist: 
it is not important that the video “looks good,” but rather it is only important that the 
video be usable. Thus pilots and analysts must be able to examine the video to recognize 
targets and to assess damage. A highly compressed image appears to be blurrier than the 
original, but the actual distortion is highly nonlinear. This means that subjective 
evaluations with the actual users of this video are necessary—evaluations that must be 
performed within a statistically valid framework to be meaningful. In other words, the 
evaluation must do more than simply asks for opinions: it must assess the ability of the 
subject to correctly perform his/her job. Little such data currently exist, making it 
impossible to accurately characterize the image quality required of a future WC/BDI 
system. 


FLEXIBLE DATA TYPES 

Logistically, it would make sense to use common, scalable image and video 
compression algorithms for all resolutions and types of sensors. Visible and HR sensors 
are the most likely candidates for WC/BDI compression, but one might wish to use the 
same datalink to also transmit Ladar and SAR images for other applications. Thus, it 
makes sense to design flexibility into the system. In a general sense, just using a digital 
communications system greatly increases the flexibility over older analog systems that 
had frame rates and resolutions built directly into the waveform. To achieve the best 
compression performance on all data types, however, it is important that the encoder be 
fully adaptive (i.e., that it not depend on a priori knowledge about the image set). Such 
knowledge is invariably sensor dependent and is unlikely to generalize well. This a priori 
information often appears in a coding algorithm in the form of VQ codebooks, 
nonuniform scalar quantizers, Huffman lookup tables, and arithmetic coder models. 
These last two are only a problem if they are fixed rather than image adaptive. 
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ROBUSTNESS TO CHANNEL ERRORS 

The RF environment is a noisy one because every electronic device radiates some 
RF energy at some frequencies. In the military arena, this unintentional radiation is 
compounded by the problem of intentional jamming by the enemy. The process of 
compression or “source coding,” as it is called by Shannon (Reference 3-3), extracts 
redundancy from the input signal. If the source coding is optimal (i.e., all redundancy has 
been extracted) then a single bit error in transmission will turn the reconstructed signal at 
the receiver into noise. To protect the compressed signal, it is necessary to add 
redundancy back into it in such a way that errors can be detected and corrected. This is 
called “channel coding.” Shannon has shown that the optimal transmission signal (in 
terms of bit rate and robustness to channel errors) can be achieved by using separate 
source and channel coding (Reference 3-3). The two most common forms of channel 
coding used today are the block-based Reed-Solomon and BCH codes and the infinitely 
extending convolutional codes (Reference 3-40). Recently, research has begun on joint 
source-channel coding in an effort to decrease the complexity of the complete system, but 
such efforts are still in their early stages (Reference 3-41). 

The other major factor affecting both the bit throughput and the signal robustness is 
the analog modulation. Using advanced modulation techniques like quadrature amplitude 
modulation (QAM, Reference 3-42) or vestigial sideband modulation (VSB, 
Reference 3-43), it is possible to robustly transmit 20 megabits per second (Mbits/s) of 
digital data over a single 6-megahertz analog television channel. Even using MPEG 1 
compression with its nominal bit rate of 1.5 Mbits/s, one can transmit 13 digital channels 
in the space currently used for one analog channel. These advanced modulation schemes 
are optimized for Gaussian noise because that is generally a reasonable model for noise 
sources in such RF channels. Thus, while a Gaussian noise jammer is very hard to 
overcome for an analog video transmission system, it is actually the best case for a digital 
system. For the digital system, long burst errors are the worst case, but this problem can 
be minimized by scrambling the bit stream as it leaves the channel coder. Complete 
diagrams of the transmitter and the receiver are shown in Figures 3-10 and 3-11. 


Video 

In 



RF Out 


FIGURE 3-10. Complete Video Transmission System. 


RF In 



Video 

Out 


FIGURE 3-11. Complete Video Receiver. 
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Unfortunately, little work has been done so far in characterizing interference sources 
in the military environment and quantifying their effects on the quality of digitally 
transmitted video. It is quite possible that unintentional RF radiation from the aircraft 
could be a greater problem than intentional jamming from the ground. Ultimately, one or 
more video transmission systems must be rigorously demonstrated in a realistic RF 
environment before these questions can be convincingly answered. 


CONCLUSION 


We have discussed a wide range of image and video coding algorithms, analyzing 
their suitability for WC and BDI applications. In the course of this investigation, we have 
concluded that while block-based DCT algorithms are superior for high bit rate, high 
quality video applications, wavelet-based algorithms are the better choice for low bit rate, 
reduced-quality video. Before actual systems can be adopted, however, there are still 
many questions to be answered. Of these, the two most important are what level of video 
quality is required for the WC and BDI applications and how does the battlefield RF 
environment affect the video datalink. Once these questions have been answered, the 
problem of constructing a new digital video compression and communications system 
will be bounded, allowing the many engineering trade-offs associated with such a system 
to be made and ultimately resulting in the introduction of this technology onto the 
battlefield. 
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CHAPTER 4. 

PARALLEL IMPLEMENTATION OF 
EMBEDDED COMPRESSION ALGORITHMS 


SUMMARY 

Here we consider the problem of parallelizing the basic wavelet transform by 
studying two methods of accomplishing this that are compatible with inter-scale coding 
techniques. Both of these methods use an overlap-save approach: one performs all data 
transfers before computation begins, while the other requires that data be exchanged 
between processors as the wavelet computations are proceeding. We study these two 
methods to determine the conditions under which each provides the optimal solution. 
Next, we present a multiple instruction, multiple data (MIMD) parallel partitioning of a 
complete image compression algorithm based on Shapiro’s EZW concept 
(Reference 4-1). The rate-distortion penalties associated with this straightforward 
parallelization are thoroughly quantified. Noting that the performance of this parallel 
algorithm might be unacceptably poor for large processor arrays, we discuss an alternate 
single instruction, multiple data (SIMD) algorithm, which achieves the same rate- 
distortion performance as the sequential EZW algorithm at the cost of higher complexity 
and reduced scalability. Finally, we examine an improved image compression algorithm 
developed by Said and Pearlman (Reference 4-2) and show how it can also be efficiently 
implemented in parallel using the approach developed for the EZW coder. 


INTRODUCTION 


In the last few years, wavelet-based image and video compression algorithms have 
been developed that are greatly superior to those based on the DCT in terms of both 
objective criteria (bit rate versus distortion) and subjective criteria (Reference 4-1 
through 4-4). Still, DCT-based coders have two important advantages over wavelet-based 
coders: lower computational complexity and a trivial parallel implementation. Both of 
these advantages are achieved because the 2-D DCT is implemented independently on 
non-overlapping 8x8 (or 16x16) blocks of image pixels. The use of non-overlapping 
blocks, however, is also the cause of the DCT-based coder’s most notorious defect: at 
low bit rates, blocking artifacts appear in the reconstructed image. 

In contrast, wavelet-based algorithms do not suffer from such artifacts, making them 
a better choice for low bit rate image and video coding applications. While one might 
consider using different filters to reduce the complexity of the wavelet transform (see 
Reference 4-5), one can never match the simplicity of the DCT. We note, however, that 
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the problem of real-time video coding using wavelets is not truly one of raw 
computational complexity but rather one of throughput; the wavelet-based coder must be 
able to keep up with the sequence of video frames. While it might be possible to solve 
this problem using a single, very fast processor, it is probably more efficient with today’s 
high levels of VLSI integration to use an array of slower processors, each working on a 
portion of the problem. The efficiency of such a parallel solution hinges on two factors: 
the complexity of the operations performed by the individual processors and the amount 
of communications required between them. In this work, we consider the problem of 
efficiently parallelizing compression algorithms that exploit inter-scale redundancy, 
focusing primarily on Shapiro’s EZW algorithm (Reference 4-1) and extending work 
previously presented in Reference 4-6. Our baseline parallel system is an array of 
processing elements (PEs) with 2-D mesh interconnections, as shown in Figure 4-1. 
Unless otherwise noted, we assume that each of these PEs is an asynchronous MIMD 
device with its own memory (e.g., a distributed memory architecture). While many other 
authors have considered the general problem of parallelizing wavelet transforms 
(References 4-7 through 4-11), we study it here in the context of a specific coding 
algorithm and illustrate how the complete system can be implemented in an efficient 
manner. 



FIGURE 4-1. A Mesh-Interconnected Processor Array. 

We briefly discuss the sequential EZW image compression algorithm below, and 
then we consider parallel implementations of the basic wavelet transforms. The main goal 
here is to select an efficient parallelization that is compatible with the coding algorithm, 
and, to this end, we consider two variations of the overlap-save approach. We then 
present results quantifying the rate-distortion penalties associated with different levels of 
encoder parallelism. We will also introduce a modified architecture that eliminates these 
rate-distortion penalties, at the expense of added hardware requirements and reduced 
scalability. Then we discuss the general applicability of our parallel partitioning approach 
to the Said and Pearlman image compression algorithm. Finally, conclusions are 
presented. 
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ZEROTREE COMPRESSION ALGORITHM 


Because a major goal of this work is to study the trade-offs associated with 
parallelizing EZW-based image compression, we first present a brief discussion of the 
basic sequential algorithm. The fundamental observation around which this coding 
algorithm is centered is that there is a strong correlation between insignificant 
coefficients at the same spatial locations in different wavelet scales .(i.e., if a wavelet 
coefficient at a coarser scale is zero, then it is more likely that the corresponding wavelet 
coefficients at finer scales will also be zero). Figure 4-2 is a 3-level, 2-D wavelet 
decomposition and the links that define a single zerotree structure. If the wavelet 
coefficient at a given scale is zero along with all of its descendants (as shown in 
Figure 4-2), then a special zerotree-root symbol (ZTR) is transmitted, eliminating the 
need to transmit the values of the descendants. Thus, the correlation of insignificance 
across scales results in a net decrease in the number of bits that must be transmitted. 



FIGURE 4-2. Wavelet Coefficient Mapping With 
One Complete Zerotree Shown. Note that the 
wavelet scale (sO, si, etc.) is inversely 
proportional to the spatial frequency. 

In order to generate an embedded code (where information is transmitted in order of 
importance), Shapiro’s algorithm scans the wavelet coefficients in a bit-plane fashion. 
Starting with a threshold determined from the magnitude of the largest coefficient, the 
algorithm sweeps through the coefficients, transmitting the sign (+ or -) if a coefficient’s 
magnitude is greater than the threshold (i.e., it is significant), a ZTR if it is less than the 
threshold, and the root of a zerotree at the coarsest possible scale (or a 0 otherwise: this is 
the dominant pass). Next, for the subordinate pass, all coefficients deemed significant in 
the dominant pass are added to a second subordinate list, which is itself scanned. One bit 
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is transmitted for each coefficient on this list during the pass, decreasing its 
approximation error in the decoder by approximately that amount. The threshold is then 
halved and the two passes are repeated, with those coefficients having been found 
significant previously being replaced by zeros in the dominant pass (so that they do not 
inhibit the formation of future zerotrees). The symbol stream created by this scanning 
process is then passed through an arithmetic coder, to eliminate any remaining statistical 
redundancy before transmission to the decoder. This process continues until the bit 
budget is exhausted; at that point, the encoder transmits a stop symbol and its operation is 
terminated. The image decoder, on the other hand, simply passes the incoming bit stream 
through its arithmetic decoder and progressively builds up the significance map and the 
subordinate list in exactly the same way as they were created by the encoder. Because of 
this precise synchronization, the resolution enhancement bits transmitted during the 
subordinate pass do not need any location specifiers; the decoder knows the exact 
transmission order of these bits because it has reconstructed the same subordinate list as 
the encoder had at that point in the processing. 


PARALLEL WAVELET TRANSFORMS 


OVERVIEW 

It is a well known fact that the total number of operations required to implement an 
algorithm in parallel is almost always larger than the number of operations required to 
implement it on a single processor. Thus, the optimal choice might be to simply pipeline 
the image compression (decompression) process by allocating one PE to each image 
frame as it comes in. For this to be a viable solution, the memory space controlled by 
each PE must be quite large. Specifically, if the original image has 8-bit pixels, then the 
EZW algorithm requires that 16 bits of memory be available for each image pixel (taking 
into account the dynamic range expansion of the transformation) plus an additional 3 bits 
per pixel for overhead information. Unfortunately, all of this memory must be available 
for the duration of the encoding/decoding process, because the embedded 
encoder/decoder must make numerous passes through the transformed coefficients. 
Pipelining also introduces at least one frame of delay for each PE in the pipeline. If one 
applied a large number of PEs to the coding task, a significant amount of throughput 
latency would be introduced into the end-to-end codec. The final problem with a 
pipelined encoder is input/output (I/O) collision. For the distributed memory MIMD 
system shown in Figure 4-1, this amounts to I/O contention along the interconnecting 
lines as multiple PEs try to access full images or to output coded coefficients at the same 
time. This is even a problem using shared memory MIMD systems having relatively few 
PEs. For example, in one set of experiments using the Texas Instruments (TI) 320C80 
(four-integer digital signal processors (DSPs) with shared memory) processor, a speed-up 
was achieved by going from one pipelined processor to two, but no further speed-ups 
were achieved when three or four processors were used. Memory contention was found to 
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be the bottleneck. Because of these many drawbacks, we focus here on parallel wavelet 
transforms that can be implemented using an overlap-save architecture. These do not add 
structural latency (as opposed to computational latency) to the system, and they are well 
suited to a distributed memory architecture. Specifically, the image pixels can be 
regionally partitioned between PEs to minimize the communications cost incurred during 
the wavelet transformation and to eliminate the need for any communications during the 
encoding/decoding process. 

PERFORMANCE ANALYSIS: 

OVERLAP-SAVE IMPLEMENTATIONS 

A number of parallel implementations of the wavelet transform have been recently 
studied (References 4-7 through 4-11). In all but Reference 4-7, however, the parallel 
implementation has been developed without carefully considering the actual application 
(in this case, image compression). Unlike the blocked DCT transform, all wavelet 
transforms (except the very simple Haar transform) use basis functions with overlapping 
spatial support, complicating their implementation in parallel. One straightforward way 
of dealing with these overlapping basis functions is to use an overlap-save methodology, 
in which more samples are input to the regional wavelet transform operating in each PE 
than are output from it. Such overlapping can be performed either once in its entirety 
before starting the transform, so that the transform can be iterated to its maximum depth 
as shown in Figure 4-3 (Reference 4-7), or progressively as shown in Figure 4-4 
(References 4-9 through 4-11). 



FIGURE 4-3. Wavelet Decomposition Overlap Save for Entire Image. 
First perform all overlap, then decompose the image in each PE (no 
inter-processor communications are required). 
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FIGURE 4-4. Wavelet Decomposition Using Overlap-Save at Each Scale. 
This requires inter-processor communication before the next level is 
computed. 


In the case of Figure 4-3, all of the overlap required for a wavelet decomposition 
down to the coarsest scale is performed at the beginning and no communication is 
required during processing, while the system of Figure 4-4 requires that coefficients be 
exchanged after each level of the wavelet decomposition is complete. The relative trade¬ 
off between these two methods depends on the cost of implementing arithmetic 
operations versus the cost of moving data around. If a wavelet decomposition having 
even length low and highpass filters, N, and N h , is used then the number of samples 
required from the left-hand (O e ) and the right-hand (O e ) PE (i.e., the PEs on either side 
of a specified PE in Figure 4-1) for each line or column of the image is 

Oe=6 e =^-l (4-1) 


where 0 is either 1 (lowpass) or h (highpass). If the lengths of the filters are odd, then the 
overlaps with left-hand and right-hand PEs along a scan line are given by 


Oi = 


N, — 1 


o h = 


_ N h -3 


O, = 


N.-3 


6 h =^Ji—!. 


(4-2) 


with the difference between lowpass and highpass filters being caused by the staggered 
decimation required for symmetric extension at the borders (Reference 4-12). Depending 
on the lengths of the filters involved and the level of the wavelet decomposition, some of 
these samples may be drawn from non-adjacent PEs, increasing the transmission cost. At 
level k of the decomposition, the number of PEs completely spanned (i.e., every low 
frequency coefficient in that PE must be transmitted to an adjacent PE) by the overlap 
regions left and right of a specified PE is given by 


p k,e - 


2 k Oe 

s 


(4-3) 
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and 



(4-4) 


respectively, where |_»J is the largest integer less than the argument and S is the size of 

the ID block of pixels initially contained within the PE. If each PE is treated 
independently, then the transmission cost incurred by one processor at level k of the 
wavelet decomposition in processing a single row (or column) of the image is then given 
by 


Ck,e - l kl 2 k Si 


2 Pke (l - P ke ) + 2 Pke (l - P k e ) - 2 


+ 2 Pke O fl +2 Pke O P 


(4-5) 


where 


t k 


= 8 • 2 k 


(4-6) 


is the cost in bits of transmitting one wavelet coefficient between PEs at level k (the 
increase in cost is due to the dynamic range expansion of the wavelet coefficients at 
coarser scales). Equation 4-5 is applicable when the PEs do not have enough memory 
available to temporarily store all of the wavelet coefficients received in one data transfer 
cycle. If sufficient memory is available, however, then the communications cost in 
Equation 4-5 is dramatically reduced to 

c k,e = t k(°e + Oe) (4-7) 


with the assumption that the mesh is circularly connected and that circular convolution is 
used to cancel edge transients. To see this, consider the ID example illustrated by 
Figure 4-5 in which each PE initially contains two coefficients and where the filter’s left- 
hand overlap, O,, equals 4. Examining the figure, we note that only four right-hand shifts 
of coefficients are required to accommodate the filter’s left-hand overlap; similarly, four 
left-hand shifts can take care of the filter’s right-hand overlap. Thus, Equation 4-7 is 
clearly true in this case. The total communications cost per PE for a depth-L 2-D wavelet 
transform using partial overlapping is therefore 


( L-l • N 

CPO L = 2S- ^max(C k>l ,C kth ) 

V k=0 ) 


(4-8) 


where each PE is assumed to start with an SxS block of image pixels. Using the fully 
overlapped method, added communications costs are incurred when the data are initially 
loaded into the parallel array. If the architecture supports direct memory access (DMA) 
from each PE to the image digitizer, this kind of communications can be considerably 
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faster than the inter-processor links of the mesh. Also, in some cases, the PEs may have 
no direct links, which would greatly decrease the efficiency of transferring data between 
them. The communications overhead (in bits) required to load the extra samples into each 
PE of the array is given by 

CFO l = 16s(max(0„0 h ) + max(6 1 ,6 h ))-(2 L+l -L-l). (4-9) 



shift #2: {i.j.a.b} {a,b,c,d} {c,d,e,f} {e.f.g.h} {g,h,i,j} 

shift #4: {g,h,i,j,a,b} {i,j,a,b,c,d} {a,b,c,d,e,f} {c,d,e,f,g,h} {e,f,g,h,i,j} 

FIGURE 4-5. Example of Efficient Coefficient Transmission Using 
Partial Overlap Method. 


The partial overlap method also increases the computational cost of implementing a 
wavelet transform relative to that incurred using a sequential algorithm. Specifically, the 
cost is increased by 



°i + Oi)+-^-(o h 


i=0 



(4-10) 


where c, and c h are the cost of producing one low and one high frequency coefficient, 
respectively. For conventional direct form implementations of these filters (i.e., not 
taking advantage of any filter coefficient relationships), c, = 2 • N, - 1 and c h = 2 • N h - 1 
operations (additions and multiplications). We do not consider multiplications and 
additions separately here because most modern signal processing chips can process both 
with equal speed. One can also calculate the added computational costs associated with 
the fully overlapped method. Specifically, for a ID block of pixels, the number of 
additional pixels that must be processed at scale k of the wavelet decomposition is 


l k,e 


Oe + Qe 


(2 L - k -l) 


(4-11) 
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which when summed over both filters, all scales, and two dimensions, results in a total 
additional cost of 


L-l 

OFO L =2S^(c,A u +c h A k . h ). (4-12) 

k=0 

This computational increase is generally larger than that of the partial overlap 
method given by Equation 4-9. To graphically compare the two methods, we form the 
equality 


OFO l +a 1 CFO L = OPO L +a 2 CPO L 


(4-13) 


where a, and a 2 weight the relative cost of communications versus computation in the 
two cases (i.e., a large value implies that the communications cost has more impact on 
total performance than the computational costs). To make comparisons, we solve 
Equation 4-12 for a,, i.e., 


CPO, OPO, OFOi 

a, =a 2 - - + --- k , 

1 2 CFO l CFO l CFO l 


(4-14) 


and plot it for different values of L (the maximum level of the wavelet decomposition) as 
shown in Figure 4-6, using the 5/3 biorthogonal wavelet #a/#s where #a and #s are the 
lengths of the analysis and synthesis lowpass filters, respectively) (Reference 4-13). The 
size of the image is assumed to be 512x512, the processor array has 16x16 PEs, and 
Equation 4-7 is used in Equation 4-8 to compute communications partial overlapping 
(CPO). Studying the figure, we first note that when L = 1, the slope of the line is 45 
degrees, implying that both methods of overlapping are equivalent if their 
communications costs are equally weighted. Clearly, this must always be true. 
Furthermore, as L increases the slope of the line decreases, making full overlapping 
optimal only when its communications costs are proportionately less important than those 
of partial overlapping (as determined by the slope of the line). If wavelets with less 
compact support (i.e., longer filters) are used, we find that the slopes of these lines 
decrease even more rapidly as L increases. This implies that the communications 
advantage required for full overlapping to be equivalent to partial overlapping becomes 
progressively more difficult to achieve. When longer filters are used and Equation 4-5 is 
substituted for Equation 4-7 in Equation 4-8 (i.e., when the processors have insufficient 
memory), the slopes of the lines at first decrease as L increases but then begin to 
gradually increase, ultimately exceeding 45 degrees. Thus, the fully overlapped method 
actually becomes more efficient than partial overlapping for some values of a, and a 2 . 
Finally, we should point out that any solution to Equation 4-14 that requires a, to be 
negative is not feasible (i.e., anything below the thick horizontal line in Figure 4-6). Such 
a solution is equivalent to saying that the communications requirement can actually 
reduce the overall cost of implementing the fully overlapped method. Partial overlapping 
is, therefore, always the optimal choice in the region where a, < 0. For the results 
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presented in the “Parallel Implementation of the EZW Coder” section below, we use the 
partial overlap method (simulated on a conventional serial processor) along with the 9/7 
biorthogonal wavelet transform (Reference 4-13). Note that the selection of overlap 
methodology is irrelevant as far as the rate-distortion performance of the image coder is 
concerned because both parallel transforms result in identical wavelet decompositions. 
We use the 9/7 wavelet because it delivers excellent rate-distortion performance in image 
coding applications while not suffering too severely from ringing artifacts at low bit rates 
(Reference 4-14). 



FIGURE 4-6. Comparison of Communications Costs With 
5/3 Biorthogonal Wavelet (Reference 4-13) and 5 Different Levels 
of Decomposition (512x512 image and 256 PEs). a, is the weight 
on the communications cost of the fully overlapped method while 
a 2 is the weight for partial overlapping. 


APPLICATION TO IMAGE COMPRESSION 

A very important question remains unanswered, however: how many PEs can one 
simultaneously and efficiently apply to the problem of EZW-based image compression? 
In References 4-9 through 4-11, the authors have tried to achieve the maximum amount 
of parallelism possible in their wavelet transforms, assigning one processor to each pixel 
in the original image. We contend here that such a strategy is wasteful for doing 
compression that exploits inter-scale redundancy (e.g., EZW or SPIHT (set partitioning in 
hierarchical trees)) because it is impossible to efficiently use all of the PEs to implement 
the coding and decoding portions of the compression algorithm. Why? Such coders 
exploit redundancy between coefficients at different scales corresponding to the same 
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region in the original image (see Figure 4-2) by calculating zerotrees. These calculations 
require that the coefficients making up a zerotree be scanned repeatedly, and this can be 
accomplished most efficiently if they all reside on just one processor. Another 
disadvantage of the 1 PE per pixel approach is that the number of processors that can be 
applied to the transform itself decreases by a factor of 4 for each successive level of the 
wavelet decomposition, leading to inefficiency (Reference 4-10). Further, as the number 
of PEs operating on the transform decreases, the communications costs increase because 
data must be consolidated into the remaining processors from all over the mesh. In short, 
the one PE per pixel philosophy requires more processing resources then can be 
effectively applied to this type of image compression algorithm. While our basic 
approach to implementing the overlapped transform is the same as in References 4-9 
through 4-11, we do not attempt to assign one processor to each pixel of the image; 
rather, each processor is assigned enough pixels to ensure that it contains at least one full 
zerotree after the wavelet decomposition is complete. If the number of levels of wavelet 
decomposition is L, then the full zerotree restriction implies that the X t xY, size 
subimages initially loaded into each processor are selected such that X, > 2 L and Y, > 2 L . 
This restriction also limits the number of parallel processors that can be easily applied to 
the problem. Specifically, if the original image dimensions are XxY, then the size of the 
parallel array, X A xY A , must satisfy X A < X/X„ and Y A < Y/Y,. Note that the subimage 
sizes X, and Y, must be powers of 2 to ensure that each processor includes only complete 
zerotrees (zero-padding can be used at the image boundaries if necessary). Clearly, the 
zerotree requirement significantly constrains the amount of parallelism allowed and 
reduces the flexibility with which processing elements can be applied (i.e., processors are 
optimally added in powers of 4). On the plus side, however, it ensures that all PEs remain 
busy during the wavelet processing and that the coding can be done in parallel with 
virtually no inter-processor communications. 

The computational complexity of the inverse transform is the same as that of the 
forward transform while the amount of data movement required is essentially the same. 
For image decompression, a relatively small amount of data must be initially moved into 
each PE (the coded symbols).and the wavelet coefficients are then generated from these 
data (with each PE then containing complete zerotrees). Coefficients must then be traded 
among PEs so that each will have what it needs to invert one level of the wavelet 
transform in a specific region of the image. Coefficients are then traded again, with the 
process continuing until the complete image has been reconstructed. Thus, computing the 
inverse wavelet transform on an array of parallel processors is essentially the same 
operation as shown in Figure 4-4, but with the communications and (inverse) transform 
steps reversed. The amount of data that must be transmitted from one PE to another is the 
same as in Equation 4-5 for the forward transform, although one should note that in this 
case the processing is performed from coarse to fine scale. 
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PARALLEL IMPLEMENTATION OF THE EZW CODER 


In this section, we quantify the trade-offs associated with the direct implementation 
of the EZW coding algorithm on multiple, parallel processors. By direct implementation 
we mean that a separate but identical copy of the zerotree encoder independently 
processes wavelet coefficients within each of the P = X A Y A processors. To allocate bits 
among the P processors, we use the entropy of the image region contained within each 
one prior to the wavelet decomposition. Thus, a region with lower entropy (implying that 
it contains less information) receives a smaller bit allocation than one with higher 
entropy. Unfortunately, independently processing groups of wavelet coefficients with 
separate encoders reduces the statistical correlation in the scanned symbol streams, 
decreasing the efficiency of the arithmetic coding process. As the number of parallel 
processors increases in this simplistic scheme, the drop in arithmetic encoding efficiency 
results in reduced rate-distortion performance. 

A very important consideration is how to get data into and out of the processor 
array. The information coming into the array (i.e., the image pixels) requires far more 
communications bandwidth than the data going out of the array. Therefore a real-time 
solution for video compression will probably not be effective without some sort of DMA 
from the individual PEs to the frame grabber. Most practical architectures that have been 
designed for video processing have such provisions (e.g., TI 320C80s and Analog 
Devices SHARCs). Because the data rate out of the encoder is much lower, we have 
more options. Because each PE is independently running its own EZW encoder on 
different data sets, the compressed output bits are produced in an asynchronous manner. 
Thus on a given clock cycle, one PE may produce a bit while another does not. There are 
two basic ways to solve this problem: (1) wait until every PE has produced its byte and 
then shift them out of the array or (2) store compressed data in the local memory of each 
PE and output it when the coding is complete. The first approach can significantly slow 
down processing and is, consequently, not preferred. If fast DMA access is available, the 
latter approach is much better. If all of the PEs try to output compressed data 
simultaneously, however, collisions can result, and the speed of the algorithm can be 
greatly reduced. This is where the asynchronous nature of the coding process actually 
becomes advantageous. The speed with which each PE finishes its coding task depends 
on the complexity of the image region corresponding to its wavelet coefficients: the more 
complex the region, the longer it takes. As more regional subdivisions are used (i.e., more 
processors), the variance of the processing times increases dramatically. Thus if each PE 
sends out its compressed bits when it finishes, it is unlikely that there will be too many 
other PEs also requesting data transfers. It is important to note as well that the amount of 
data being output is much less than the amount being input, so the system performance is 
primarily limited by the data path from the digitizer to the array. 
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Obviously, the decoder suffers from a similar but inverted problem: a relatively 
small amount of compressed data is initially read into each PE, and a large amount of 
decompressed data is read out. For the same reasons as above, the speed at which image 
pixels can be read out is the key to achieving good real-time performance. In this case, 
the input data can be read in and processed as they are received, with the caveat that all of 
the wavelet coefficients must be decoded before the inverse wavelet transform is 
performed. For all practical purposes, the inverse wavelet transformation is completed 
simultaneously in all PEs, so the reconstructed image pixels are all available for output at 
the same time. Again, an architecture that supports DMA directly from the PEs’ local 
memory to a display device would be desirable here. 

The results of this parallel coding algorithm in terms of the bit rate (bits per pixel 
(bpp)) versus the peak signal to noise ratio are summarized in Figure 4-7 for the 512x512 
“Lena” image. For the results shown in Figure 4-7, each processor computes and 
transmits the mean of its block of pixels and the maximum wavelet coefficient value 
(“maxval”) of the block after transformation. In contrast, the conventional EZW encoder 
computes and transmits one global mean and one global maxval. Transmitting the local 
mean and maxval for each processor eliminates the need for any communications 
between the PEs during the coding process, but it also results in a significant rate- 
distortion penalty as the number of PEs increases. Worse yet, removing local mean 
values before the wavelet transformation (instead of the global mean value) results in 
blocking artifacts at low bit rates (see Figure 4-8). If instead a single mean and maxval 
are sent for the entire image, the rate-distortion performance of the parallel 
implementation greatly improves, as illustrated by Figure 4-9. Computing a global mean 
and maxval requires approximately 2(X A + Y A ) communications operations per PE in the 
encoder, but this is a small price to pay for such a dramatic improvement in performance. 
For example, if 256 processors are used, the PSNR shown in Figure 4-9 is about 
2 decibels higher than that of Figure 4-7 at bit rates between 0.07 and 0.26 bpp. 
Figures 4-10 through 4-14 visually illustrate some of these parallel trade-offs for the Lena 
image coded at 0.25 bpp. 
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BIT RATE (Bits/Pixel) 

FIGURE 4-7. EZW Coding Results for the 512x512 Lena Image With a 
Separately Computed Mean and maxval for Each Processor. Each of P 
processor operates on square blocks of image pixels with alternating solid 
and dotted lines included for clarity. 






0 0.5 1 1.5 2 2.5 

BIT RATE (Bits/Pixel) 

FIGURE 4-9. EZW Coding Results for the 512x512 Lena Image With 
Common Mean and maxval for All Processors. Each of P processor 
operates on square blocks of image pixels with alternating solid 
lines and dotted included for clarity. 



FIGURE 4-10. Original Lena Image, 512x512 Pixels. 
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FIGURE 4-12. 0.25 bpp, P = 4, Separate Means and maxvals, PSNR = 32.77 dB. 
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When studying the figures, we note that the rate-distortion penalty incurred by using 
a parallel implementation seems to be directly proportional to the number of PEs (a 
number that increases exponentially in the plots). Thus, the penalty for using just four 
processors in parallel is quite acceptable while that for using 256 processors may not be 
(-2 decibels lower PSNR than in the single processor case at all bit rates in Figure 4-9). 
We should also point out other features of a direct parallel implementation of the EZW 
algorithm. First, the embedded structure of the algorithm can be maintained by 
interleaving the bits output from the PEs before transmitting them. Because the data rate 
is relatively low (it operates on compressed data), interleaving can be performed with a 
relatively slow processor having only a modest amount of memory. It will, however, add 
latency to the system, so it may not be suitable for applications requiring very low 
throughput delay. In addition, because different regions of the image are coded 
independently by each PE, the parallel-structured coder is more resistant to transmission 
errors: if a single error occurs, only one section of the image is impacted (it will be 
reconstructed at lower resolution than the others) (Reference 4-15). In a conventional 
EZW image coder, on the other hand, one bit error reduces the quality of the entire 
reconstructed image. Finally, we note that a stop symbol is normally transmitted by each 
processor to terminate its bit stream. If the bit allocation is determined a priori, this 
added symbol can be completely eliminated, improving the rate-distortion performance 
slightly at lower bit rates and with higher levels of parallelism. 

Whether global or local mean and maxval quantities are transmitted, the complexity 
of the parallel zerotree decoder is the same. In either case, the appropriate mean and 
maxval are first broadcast to each processor and then the bit stream is parsed and 
properly distributed among the processors as it arrives. The decoders operating in each 
PE are identical to the sequential EZW decoder of Reference 4-1 and are completely 
independent of each other, making the overall parallel implementation very efficient and 
highly scalable. 


MODIFIED PARALLEL IMPLEMENTATION 


As noted above, a significant performance penalty is incurred when the EZW 
algorithm is directly mapped to multiple processors. We propose here an architecture that 
eliminates this penalty at the expense of added computational complexity. Figure 4-15 
illustrates the basic idea. The processors in the mesh perform the transform, calculate the 
zerotrees, and output the symbols (+, -, 0, and ZTR (zerotree root) for the dominant list) 
to a single processor, which codes them using a small-alphabet arithmetic encoder. It 
turns out that all of these processes can be performed using synchronous SIMD PEs and, 
in fact, that there is no real advantage to even considering MIMD devices because rigid 
output synchronization must be maintained. Other authors have shown that the wavelet 
transform can be implemented on an SIMD array (Reference 4-9), but we have 
determined that the zerotree formation and symbol generation processes can also be 
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performed synchronously. The details of this implementation are beyond the scope of this 
paper, but suffice it to say that each PE executes exactly the same instruction at the same 
time on different data: our SIMD code contains no “if’ statements whatsoever. To 
maintain this structure, a concession must be made. Each PE would normally generate 
output symbols in an asynchronous way because of the compression inherent in the ZTR 
symbol and the fact that differing numbers of significant symbols (i.e., + and -) are 
generated. However to maintain synchronization between the PEs and the output 
processor, each PE must output its symbol at a predetermined time. If a given PE does 
not have a true code symbol to send, then a dummy symbol is used instead, which the 
output processor l. K ows to ignore. The amount of communications required between the 
array and the outp^i processor is clearly somewhat high, but because only one symbol 
(2 or 3 bits) is sent for every 20 instructions executed, it should be feasible with many 
architectures (i.e., those which can execute a communications and an arithmetic operation 
simultaneously) to use a bucket-brigade style pipeline to keep the symbols flowing 
toward the output processor. In fact if this approach is used, no symbol reordering is 
needed prior to arithmetic encoding at the output processor. In addition, by using a 
common mean and maxval for all processors, the parallel architecture described by 
Figure 4-15 can achieve performance identical to that of the conventional zerotree 
algorithm. In order to achieve a fixed bit rate output stream, the output processor must 
also monitor its bit production and turn off the mesh processors when the desired bit rate 
is achieved. Finally, one should note that while the output processor does have to run 
fairly fast, it does not have to run P-times as fast (where P is again the number of parallel 
processors) because each PE executes approximately 20 instructions for each symbol it 
creates, and most of the symbols are discarded without arithmetic encoding. 
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FIGURE 4-15. Modified Parallel EZW Image Coding Architecture. 

A parallel decoder for this modified scheme has essentially the same architecture as 
the encoder (i.e., an input processor that performs the arithmetic decoding and an array of 
mesh processors that successively approximate the wavelet coefficients). Unfortunately, 
in this case more processing must be performed in the input processor. Specifically, the 
incoming bits for a given pass (i.e., the symbols generated by processing 1 bit plane of 
the wavelet coefficients) are first arithmetically decoded and the ZTR symbols converted 
into appropriate blocks of zero-valued coefficients. These symbols must then be 
transferred to the processor array, where non-zero coefficients are approximated and 
where, eventually, the inverse wavelet transformation is performed. While the parallel 
encoder operates in a more or less flow-through fashion, the decoder must operate in 
what is basically a batch mode. Furthermore, it requires a more powerful input processor 
relative to the power of its PE array when compared to the encoder, because it must also 
decode the ZTR symbols. One should note, however, that an EZW decoder is always less 
complex than its corresponding encoder, so it would still probably be possible to 
implement them both using the same single/multiple processor hardware configuration. 
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PERFORMANCE COMPARISONS ON THE TI 320C80 


Thus far, we have only presented performance comparisons in which parallel 
operation had been simulated on a sequential processor. While such analysis provides 
valuable insight into some of the design trade-offs that are possible within a parallel 
framework, it is not completely satisfactory. Ultimately, one would like to actually run 
the algorithm in parallel and find out how it really works. As part of an ongoing project, 
we have implemented wavelet compression and decompression algorithms on a TI 
320C80 multivideo processor. This chip contains 5 MIMD parallel processors, 4 integer 
and 1 floating point, with cache-controlled global memory, and it is capable of 
performing in excess of 1.8 billion operations per second. To perform image analysis in 
the encoder, we use a 5/3 biorthogonal wavelet transform operating on the four integer 
PEs, with partially overlapped parallel partitioning. The video frames fed into our 
encoder are 512x240 pixels, and we decompose each to 4 levels vertically and 5 
horizontally. Next, the wavelet coefficients created by this decomposition are encoded 
separately by each of the 4 integer PEs as described above. The encoder used here is very 
similar to the one summarized in the “Imaging Platform” section of this report, except 
that it contains some enhancements that provide additional speedups (References 4-16 
and 4-17). Considering execution times for the wavelet transform alone, running on all 4 
of the PEs in parallel is 3.22 times faster than running on just 1 sequentially (16 versus 
52 milliseconds for a single frame of video). This is about what we would expect because 
the overlap inherent in the wavelet transform requires each PE to process and 
communicate extra data. When just coding the coefficients, however, the 4-processor 
system is 3.96 times faster (90 versus 356 milliseconds): almost exactly a linear speedup, 
as expected! Overall, we achieve a frame rate of 9.4 frames per second (fps) using the 
parallel system versus 2.4 fps for an identical sequential system. In both cases, the rate- 
distortion performance is exactly the same because it is limited by the cache-based 
zerotree processing requirement imposed by Reference 4-18. 


PARALLEL COMPRESSION USING 
SPATIAL ORIENTATION TREES 


An image compression algorithm was recently introduced by Said and Pearlman that 
also exploits inter-scale correlation to achieve excellent rate-distortion performance 
(Reference 4-2), better in fact, than the original EZW algorithm. This new algorithm 
organizes the wavelet coefficients into spatial orientation trees that are then used to form 
coefficient sets and searched (to determine a coefficient’s significance) in a way similar 
to the EZW coder. As with EZW, this coder can be parallelized efficiently with no or 
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minimal communications if a complete spatial orientation tree is contained within each 
PE. To ensure this, the image subdivision must be formed so that every PE contains 
4 complete zerotrees. If the number of levels of wavelet decomposition is again L, this 
constraint implies that the size of the subimage in each processor prior to the wavelet 
decomposition, X,xY„ must be at most X, = 2 L_I and Y, = 2 L_I , which in turn limits the 
size of the parallel array to X A = X/X„ and Y A = Y/Y, (for an XxY image). Thus, using 
this coding algorithm in place of the conventional EZW algorithm reduces the allowable 
parallelism by a factor of 4. Because of their more sophisticated arithmetic coder, it is 
also likely that an increased penalty might be paid for higher levels of parallelism when 
using the direct implementation discussed above. The modified parallel architecture 
discussed here should also work with this algorithm and would not significantly reduce 
the rate-distortion performance if properly implemented. 


CONCLUSIONS 


In this work, we have considered the complete problem of implementing a wavelet- 
based image compression algorithm on an array of processors operating in parallel. In 
particular, we have considered both MIMD and SIMD implementations, discussing their 
strengths and weaknesses. As part of this process, we have studied parallel 
implementations of the wavelet transform and have postulated the “ideal” level of 
parallelism for this application (i.e., enough PEs so that every one contains at least one 
complete zerotree after decomposition). We have also analyzed the partitioning of the 
coding algorithm itself, quantifying the performance penalty incurred by parallelization 
and introducing a way to overcome this penalty by using a distributed memory SIMD 
architecture. While much past research has been done on parallelizing the wavelet 
transform, it has seldom been applied in the context of a complete image compression 
application (i.e., it has not considered parallelization of the quantization and coding 
processes). We have done that here for an important class of high performance algorithms 
that exploit inter-scale redundancies. In a more general sense, we hope that our work has 
reinforced the view that it is important to consider the entire application, not just one part 
of it, when studying the problem of efficient parallel implementation. 
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CHAPTER 5. 

ROBUSTNESS TO TRANSMISSION ERRORS 


SUMMARY 

In this work, we present a new family of image compression algorithms derived 
from Shapiro’s EZW coder. These algorithms introduce robustness to transmission errors 
into the bit stream, while still preserving its embedded structure. This is done by 
partitioning the wavelet coefficients into groups, coding each group independently, and 
interleaving the coded bit streams for transmission; thus if one bit is corrupted, then only 
one of these bit streams will be truncated in the decoder. To evaluate these algorithms, 
we have analyzed them in a stochastic framework. By combining the results of this 
analysis with the actual robust EZW coder, we can objectively and subjectively compare 
the expected reconstructed images generated by the new algorithms to those output by the 
original. These comparisons clearly show that all of the members of the robust family are 
superior to the conventional algorithm in realistic transmission environments; further, 
they facilitate the selection of the optimal member of this family for a given channel. 
Finally, we note that the new algorithms do not increase the complexity of the overall 
system and, in fact, that they are far more easily parallelized than the conventional EZW 
coder. 


INTRODUCTION 


Recently, the proliferation of wireless services and the Internet along with the 
consumer demand for multimedia products have spurred interest in the transmission of 
image and video data over noisy communications channels whose capacities vary with 
time. In such applications, it can be advantageous to combine the source and channel 
coding (i.e., compression and error correction) processes from both a complexity and an 
information theory standpoint (Reference 5-1). In this chapter, we introduce a form of 
low complexity joint source-channel coding in which varying amounts of transmission 
error robustness can be built directly into an embedded bit stream. The approach taken 
here modifies Shapiro’s EZW image compression algorithm (Reference 5-2), but the 
basic idea can be easily applied to other wavelet-based embedded coders such as those of 
Said and Pearlman (Reference 5-3) and Taubman and Zakhor (Reference 5-4). Some 
preliminary results using this approach have previously been presented (see 
References 5-5 and 5-6). 
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This chapter is organized as follows: we first discuss the conventional EZW image 
compression algorithm and its resistance to transmission errors. Then we develop our 
new, robust coder and explores the options associated with its implementation. We then 
analyze the performance of the robust algorithm in the presence of channel errors and we 
use the results of this analysis to perform comparisons. Finally, we discuss 
implementation and complexity issues, followed by our conclusions. 


EZW IMAGE COMPRESSION 


After performing a wavelet transform on the input image, the EZW encoder 
progressively quantizes the coefficients using a form of bit plane coding to create an 
embedded representation of the image (i.e., a representation in which a high resolution 
image also contains all coarser resolutions). This bit plane coding is accomplished by 
comparing the magnitudes of the wavelet coefficients to a threshold T to determine which 
of them are significant: if the magnitude is greater than T, that coefficient is significant. 
As the scanning progresses from low to high spatial frequencies, a 2-bit symbol is used to 
encode the sign and position of all significant coefficients. This symbol can be a + or--, 
indicating the sign of the significant coefficient; a 0 indicating that the coefficient is 
insignificant; or a ZTR indicating that the coefficient is insignificant along with all of the 
finer resolution coefficients corresponding to the same spatial region. The inclusion of 
the ZTR symbol greatly increases the coding efficiency because it allows the encoder to 
exploit inter-scale correlations that have been observed in most images (Reference 5-2). 
After computing the “significance map” symbols for a given bit plane, resolution 
enhancement bits must be transmitted for all significant coefficients; in our 
implementation, we concatenate two of these to form a symbol. Prior to transmission, the 
significance and resolution enhancement symbols are arithmetically encoded using the 
simple adaptive model described in Reference 5-7, with a 4-symbol alphabet (plus 1 stop 
symbol). The threshold T is then divided by 2 and the scanning process repeated until 
some rate or distortion target is met. At this point, the stop symbol is transmitted. The 
decoder, on the other hand, simply accepts the bit stream coming from the encoder, 
arithmetically decodes it, and progressively builds up the significance map and 
enhancement list in the exact same way as they were created by the encoder. 

The embedded nature of the bit stream produced by this encoder provides a certain 
degree of error protection. Specifically, all of the information that arrives before the first 
bit error occurs can be used to reconstruct the image; everything that arrives after is lost. 
This is in direct contrast to many compression algorithms where a single error can 
irreparably damage the image. Furthermore, we have found that the EZW algorithm can 
actually detect an error when its arithmetic decoder terminates (by decoding a stop 
symbol) before reaching its target rate or distortion. It is easy to see why this must 
happen. Consider that the encoder and decoder use the same backward adaptive model to 
calculate the probabilities of the 5 possible symbols (4 data symbols plus the stop 
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symbol) and that these probabilities directly define the codewords. Not surprisingly, the 
length of a symbol’s codeword is inversely proportional to its probability. If a completely 
random bit sequence is fed into the arithmetic decoder, then the probability of decoding 
any symbol is completely determined by the initial state of the adaptive model (i.e., the 
probability weighting defined by the model is not, on the average, changed by a random 
input). 

In our implementation of the Witten et al arithmetic coder of Reference 5-7, we set 
Max_frequency equal to 500 and maintain the stop symbol probability at l/cum_freq[0]. 
Because cum_freq[0] (the sum of the frequency counts of all symbols) is divided by 2 
whenever it exceeds Max_frequency, the probability of decoding a stop symbol stays 
mostly between 1/250 and 1/500. Thus, if a random bit stream is fed into the decoder 
after training it to this point, an average of 250 to 500 symbols will be processed before 
the stop symbol is decoded. The bit stream is correctly interpreted as long as the decoder 
is synchronized with the encoder, but this synchronization is lost shortly after the first 
error occurs. Once this happens, the incoming bit stream looks random to the decoder 
(the more efficient the encoder, the more random it will appear). Because each symbol is 
represented in the compressed image by between one and two bits, the decoder should 
self-terminate between 31 and 125 bytes after an error occurs. Experimentally, we have 
found that the arithmetic decoder overrun is typically between 30 and 50 bytes, which is 
consistent with the theoretical range because most of these terminations took place while 
decoding the highly compressed significance map. If the overrun is small compared to the 
number of bits correctly decoded, it does not significantly affect the quality of the 
reconstructed image. While some erroneous information is incorporated into the wavelet 
coefficients, the bit plane scanning structure ensures that it is widely dispersed spatially, 
making it visually insignificant in the image. 


REZW ALGORITHM 


BASIC APPROACH 

As shown in Figure 5-1, the basic idea of the REZW image compression algorithm is 
to divide the wavelet coefficients up into S groups and then to quantize and code each of 
them independently so that S different embedded bit streams are created. These bit 
streams are then interleaved as appropriate (e.g., bits, bytes, packets, etc.) prior to 
transmission so that the embedded nature of the composite bit stream is maintained. In 
the remainder of this chapter, we assume that individual bits are interleaved. For the 
REZW approach to be effective, each group of wavelet coefficients must be of equal size 
and must uniformly span the image. A similar method has been proposed in 
Reference 5-8 to parallelize the EZW algorithm, but that method instead groups the 
coefficients so that data transmission between processors is minimized. 
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I 

Transmit to 
decoder 

FIGURE 5-1. Structure of the REZW Algorithm. 
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What do we gain by using this new algorithm over the conventional one? As pointed 
out above, the EZW decoder can use all of the bits received before the occurrence of the 
first error to reconstruct the image. By coding the wavelet coefficients with multiple, 
independent (and interleaved) bit streams, a single bit error truncates only one of the 
streams; the others are still completely received. Consequently, the wavelet coefficients 
represented by the truncated stream are reconstructed at reduced resolution, while those 
represented by the other streams are reconstructed at the full encoder resolution. If the set 
of coefficients in each stream spans the entire image, then the inverse wavelet transform 
in the decoder evenly blends the different resolutions so that the resulting image has a 
spatially consistent quality. 


ZEROTREE PRESERVING (ZP) PARTITIONING 


Figure 5-2 graphically illustrates this wavelet coefficient partitioning for S = 4 bit 
streams and three wavelet scales. In the figure, each coefficient with the same shade of 
gray maps to the same group and is, therefore, processed by the same encoder. 
Furthermore, boxes with Xs in them are used to define the elements of one zerotree, and 
we note that this zerotree is identical to those used by the conventional EZW coder 
(Reference 5-2); hence the name “zerotree preserving.” It is clear from this example that 
all of the elements from a given zerotree are fed into the same encoder and thus, that the 
correlation of insignificant coefficients between scales is fully exploited. Note that S can 
be increased by powers of 4 until the encoder in each stream processes just one zerotree. 
If the image is of size XxY (assuming for simplicity that both of these are powers of 2) 
and maximum number of scales (NS) of wavelet decomposition are used, then the single 
zerotree limitation implies that the maximum number of independent t>it streams allowed 
is S = (X-Y)/4 ns . While introducing more bit streams is advantageous in terms of 
robustness to errors, it also results in reduced rate-distortion performance when no 
transmission errors occur (see the “Results” section below). The partitioning of scale j (as 
labeled in Figure 5-1) of the wavelet coefficient mapping Wj (x,y) into S = 4 K < 
(X'Y)/4 NS groups, is given by 


W«pj(x,y) = Wj 


r 


2 NS-j-ll 

X 

2NS-j-l 

2 ns- i-ij 

y 

oNS-j-1 

l 

_ Z 



(5-1) 


where 'F = n + 2 K m specifies the stream number for {n,m}e[o,2 K -l] and |_*J is the 

largest integer less than the argument. Note that the scale j is inversely proportional to the 
frequency: e.g., j = 0 and j = NS - 1 reference the highest and lowest frequency subbands, 
respectively. 
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FIGURE 5-2. Zerotree Preserving Wavelet Coefficient Partitioning 
for S = 4 and 3 Wavelet Scales. All coefficients with the same shade 
map to the same stream, and the Xs denote one complete zerotree. 


OFFSET ZEROTREE (OZ) PARTITIONING 

A second partitioning of the wavelet coefficients is illustrated by Figure 5-3 
when S = 4 and NS = 3. In this case, the parent-child relationships of the conventional 
zerotree structure are no longer preserved (as indicated by the boxed Xs in the figure). 
Instead, these relationships are staggered between wavelet scales in such a way that no 
adjacent wavelet coefficients are input to the same encoder stream. In the more general 
case where S = 4 K < (X-Y)/4 NS , every coefficient in a given stream is separated from any 
others in the same stream by at least 2 K -1 wavelet coefficients. As with the zerotree 
preserving partitioning above, every 2 K th sample (in both the horizontal and vertical 
directions) of the four coarsest scales is mapped to the same stream. In the offset zerotree 
partitioning, however, this 2 K th resampling is applied at all other wavelet scales as well 
(i.e., Equation 5-1) is applied to all scales with j = NS - 1). The advantage of this method 
over ZP partitioning is that if one bit stream is terminated prematurely, its reduced- 
resolution coefficients will be surrounded by full resolution coefficients at every scale. 
Thus, the image reconstructed by the inverse wavelet transform retains more of its high 
frequency information when selective transmission errors occur. Unfortunately, the 
zerotrees are now spatially staggered between scales, reducing the inter-scale correlation 
between wavelet coefficients and, consequently, the rate-distortion performance of the 
coding algorithm. 
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FIGURE 5-3. Offset Zerotree Wavelet Coefficient Partitioning for 
S = 4 and 3 Wavelet Scales. All coefficients with the same shade 
map to the same stream, and the Xs denote one complete zerotree. 


STOCHASTIC ANALYSIS 


To evaluate the effectiveness of this family of robust compression algorithms, we 
assume that the coded image is transmitted through a binary symmetric, memoryless 
channel with a probability of bit error given by e. We would like to know the number of 
bits correctly received in each of the S streams. Because this quantity is itself a random 
variable, we use its mean value to characterize the performance of the different 
algorithms. Because the channel is memoryless, streams terminate independently of each 
other, but the mean values of their termination points are always the same for a specified 
£. Assuming that the image is compressed to B total bits and that S streams are used, then 
the probability of receiving exactly k of the B/S bits in each stream correctly is given by 


p(k) = 


|e-0-e) k , 
1 0-e) k , 



(5-2) 


which is a valid probability mass function as one can easily verify by summing over all k. 
In Equation 5-2, (1 - £) k is the probability that the first k bits are correct, while £ is the 
probability that the (k+l)th bit is in error. Note that a separate term conditioned on B/S is 
necessary to take into account the possibility that all of the bits in the stream are correctly 
received. The mean value can now be calculated as 


73 




NAWCWD TP 8442 


m s = X k 'P( k ) ( 5_3 ) 

k=0 

On the average, the total number of bits correctly received is S-m. If B/S is large 
relative to 1/e, then m s == irq for VS, and, therefore, approximately S times more bits are 

correctly decoded for the robust algorithm than for the conventional one (i.e., S = 1). 
Generally, the gain actually achieved is not this high, but it is nonetheless significant. In 
the “Results” below, we use Equation 5-3 to analyze the impact of transmission errors on 
the average quality of the reconstructed image for all possible values of S. 


RESULTS 


For these comparisons, we introduce an error into all streams simultaneously 
according to Equation 5-3, and we allow each stream to self-terminate. Furthermore, we 
assume that the size of the image file is known by the decoder or, equivalently, that 
synchronization information is inserted after each image in a sequence. A 5-level 
transform using the 9/7 biorthogonal wavelet (Reference 5-9) is then applied to .the 
image, and all possible robust partitionings are evaluated (S = 1 is, again, the 
conventional EZW coder). From the objective results summarized in Figure 5-4, we note 
that considerable robustness is achieved at the expense of perfect channel rate-distortion 
performance. For the Lena image, the PSNR drops from 39.28 decibels (S = 1) to 
36.13 decibels (S = 256), while for Barbara the equivalent drop is from 34.84 to 
30.09 decibels. With an error rate of £ = 10~ 2 , however, the average improvement of the 
REZW algorithm over a conventional EZW coder is more than 15 decibels for the same 
images. Examining Figure 5-5, we note that the objective performance of the ZP 
partitioning is generally superior to that achieved using offset zerotrees. This superiority 
is even more evident in the subjective quality, as shown by Figures 5-6 and 5-7. Most of 
the artifacts in Figure 5-7 are actually introduced by the arithmetic decoder overrun, and 
if this is eliminated (e.g., the position of the error is known), the ZP and OZ partitionings 
result in almost identical subjective quality at all error rates. Overrun distortion becomes 
less visually significant when more streams are used, as is illustrated by Figures 5-8 and 
5-9, although it still reduces the PSNR more dramatically for the OZ partitioning. For a 
multi-channel transmission method such as orthogonal frequency division multiplexing, 
the offset ZP is perceptually superior to the ZP partitioning because the error rates in the 
different channels can be radically different. The advantages of the offset zerotree 
partitioning in this situation are clearly shown in Figures 5-10 and 5-11, where the 
truncation point in each stream has been separately calculated using Equation 5-3. Note 
that here we stop the coding process in each stream immediately after the first bit error 
occurs to prevent the decoder overrun from obscuring the comparison. 
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FIGURE 5-4. PSNR (in dB) Versus the Probability of a Bit Error. 
Lena (a) and Barbara (b) coded at 1.0 bpp using ZP partitioning. 



PROBABILITY OF BIT ERROR 

FIGURE 5-5. Comparison of Partitioning for Lena Image Coded at 1.0 bpp. Connected 
lines correspond to OZ partitioning, while symbols correspond to ZP partitioning. Note 
that ZP has slightly better rate-distortion performance in most cases, with the major 
exception being when S = 256 and the error rate is low. 
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FIGURE 5-6. Lena Coded to 1.0 bpp Using ZP Partitioning With S = 4 
and Probability of Error e = 10 -3 . PSNR = 22.57 dB. 



FIGURE 5-7. Lena Coded to 1.0 bpp Using OZ Partitioning with S = 4 
and Probability of Error e = 10 -3 . PSNR = 21.98 dB. 
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FIGURE 5-9. Lena Coded to 1.0 bpp Using OZ Partitioning With 
S = 256 and Probability of Error e = 10~ 3 . PSNR = 33.97 dB. 
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FIGURE 5-11. Lena Coded to 1.0 bpp Using OZ Partitioning With S = 4 
and Probability of Errors of 10 -2 , 10"\ 10^, and 10" 5 in Each Stream. 
PSNR = 22.70 dB. 
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IMPLEMENTATION AND COMPLEXITY 


The robust compression algorithm illustrated by Figure 5-1 can be implemented 
efficiently either on a conventional sequential processor or on an array of MIMD parallel 
processors. A sequential implementation first performs a wavelet transform and then 
executes one of the vertical branches of Figure 5-1 until the appropriate number of bits 
(based on the interleaving structure) are produced. At that point, execution control is 
passed to the next branch, and the process continues in a round robin fashion until the bit 
allocation is exhausted. In the parallel implementation, a parallel wavelet decomposition 
is performed and each branch in the figure is executed independently by separate 
processors. In this case, either the outputs of the processors have to be synchronized to 
achieve proper interleaving or an additional output processor must be used to organize 
the data. 

Neither implementation of the robust algorithm significantly affects the complexity 
of the complete system. Parallel wavelet transforms do have higher computational 
complexity (see Reference 5-10), but this increase is also incurred in a parallel 
implementation of the conventional algorithm (Reference 5-8). Furthermore, the search 
complexity of the REZW coder actually decreases slightly with S because the number of 
significant coefficients trends downward. The only quantifiable disadvantage of the 
robust algorithm is that it requires more temporary storage space because the coders 
operating in each stream must have their own thresholds and adaptive models. In reality, 
however, its most significant drawback is that it is simply more intricate than EZW, 
making it harder to efficiently program. Once programmed, it is just as fast as the 
conventional coder and requires only a small amount of additional memory. 


CONCLUSION 


We have shown here how robustness to transmission errors can be added to an 
embedded image compression algorithm with little or no definable increase in its 
complexity. As the number of partitions, S, increases, the resilience of the coded image to 
transmission errors also increases, but the perfect channel rate-distortion performance of 
the codec decreases. If the bit error rate of the channel is greater than 1CT 6 , however, the 
performance of our robust compression algorithm is generally better than that of the 
conventional embedded zerotree wavelet coder for some value of S. 
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CHAPTER 6. 

INTERFRAME COMPRESSION 


SUMMARY 

In this chapter we consider the problem of compressing video for systems in which 
encoder complexity is a major constraint. While our area of particular concern is that of 
airborne remote sensing, the basic approach developed here is more widely applicable. 
Because of power constraints on a remote platform, video must be transmitted over 
narrow bandwidth communications channels, requiring that the encoder operate at very 
low bit rates. Consequently, a more complex encoder must be used that can exploit both 
spatial and temporal redundancy in the video sequence. To reduce this complexity, we 
introduce a new approach to motion compensation similar to the conventional hybrid 
differential pulse-code modulation (DPCM) transform method, but where the 
compensation is performed outside the feedback loop. Within this framework, many 
specific implementations are possible, and we study a few of them here. While the basic 
idea is conceptually similar to the pan compensation proposed by Taubman and Zakhor, 
our method continually tracks and updates the image in the feedback loop in the same 
way as the conventional hybrid coder. Using both residual energy and reconstruction 
error as metrics, we show that the new motion compensation scheme is actually superior 
to the conventional, one despite achieving an encoder complexity reduction of as much 
as 27%. 


INTRODUCTION 


In remote-sensing applications, severe constraints are placed on the design of the 
video encoder by weight, volume, and power limitations (Reference 6-1). To further 
complicate matters, power and antenna concerns force the communications channel 
through which the video must be transmitted to have a narrow bandwidth, requiring that 
the compression algorithm operate efficiently at low bit rates. In applications involving 
real-time remote control, video latency also becomes a significant systems constraint. 
Unfortunately, achieving good rate-distortion performance with such a restrictive channel 
generally requires a high complexity encoder and results in greater video latency, 
increasing the cost of the remote system and reducing its performance. For example, the 
most successful video compression technique to date, the hybrid DPCM-transform 
algorithm (Figure 6-1), has a far more complex encoder than decoder, exactly the 
opposite of what is desired in this application. To address this deficiency, we introduce a 
new method of motion compensation that eliminates the inverse transform operation 
required in Figure 6-1 and thus results in considerably lower encoder complexity than the 
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conventional approach (References 6-2 and 6-3). While we apply this new approach here 
only to sequences having primarily global platform motion, it might also be useful in 
other applications where complexity issues are critical, including mobile video 
teleconferencing. 



(a) Encoder. 



MVs 


(b) Decoder. 

FIGURE 6-1. Conventional Hybrid-Transform Video Compression System. 


This chapter is organized as follows. We first discuss conventional solutions to the 
video compression problem, pointing out their disadvantages with respect to low 
complexity applications. Then our new method of motion compensated transform coding 
is introduced, along with a discussion of its trade-offs and practical implementation. We 
then detail the complete video coding algorithm and comparative results, followed by the 
conclusions. 
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CONVENTIONAL MOTION COMPENSATION 

SPATIAL DOMAIN COMPENSATION 

The most common form of hybrid-transform video compression performs the motion 
compensation and differencing operations in the spatial domain, as shown in Figure 6-1. 
This technique is used in all of the existing video compression standards, including the 
motion picture experts group (MPEG) I and II standards for broadcast (Reference 6-4 and 
6-5) as well as the H.261 and H.263 standards for video teleconferencing (Reference 6-6 
and 6-7). For these coders, the transform used in Figure 6-1 is an 8x8 blocked DCT. In 
terms of our application, the complexity of the encoder illustrated in Figure 6-la is too 
high—at least double that of the corresponding decoder. The problem of encoder 
complexity is even worse for low bit rate applications because the relatively simple DCT 
is typically replaced with a more complex wavelet or subband decomposition. Doing this 
improves the rate-distortion performance of the video compression algorithm and 
eliminates many perceptually annoying artifacts (e.g., blocking), but it increases the 
encoder’s complexity even more because the encoder contains both a forward and an 
inverse transform. Thus as better transforms are used to improve the performance of the 
spatial coder, the implementation cost of the encoder increases rapidly. 

Assuming that F k is the current frame in the video sequence, that an invertible 
transform is used, and that no quantization is applied, the reconstructed frame output 
from the decoder is 

FW k -MC{F l _ l } + MC{F l _,} (6-1) 

for the method of Figure 6-1, where MC indicates the motion compensation operator. 
Clearly, the frame reconstructed by the decoder is identical to that input to the encoder, 
regardless of what type of motion compensation is performed. This property offers 
flexibility that can be used to improve the rate-distortion performance of the system. 


TRANSFORM DOMAIN MOTION COMPENSATION 

Many authors have proposed directly motion compensating the transform 
coefficients in many different ways and for a variety of reasons (References 6-8 through 
6-14). In Reference 6-8, direct compensation of the coefficients was proposed so that 
spatial scalability could be introduced into the coded bit stream. The authors noted that 
the maximally decimated transforms typically used in coding applications are not 
spatially invariant and thus that the compensation operations must be altered when 
applied in the transform domain. Specifically, to compensate the coefficients in a given 
subband, one must, in general, use coefficients from all of the subbands in the image. 
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Despite achieving performance equivalent to that of spatial compensation, the method of 
Reference 6-8 is poorly suited to our needs because its also results in a significant 
increase in encoder complexity. Many of the other subband- or transform-based motion 
compensation schemes that have been proposed also increase the encoder’s complexity 
(References 6-9 and 6-12). 

Authors have also proposed the direct implementation of motion compensation in the 
subband domain (References 6-10 and 6-13). Figure 6-2 details the encoder of such a 
compression scheme, in which the image is first decomposed into different frequency 
bands using subband filters. Then motion compensation is performed in each band 
independently, using interpolation to compensate for decimation in the analysis filter 
bank. Where direct comparisons have been performed, however, motion compensation in 
the subband domain has often proven inferior to compensation applied in the spatial 
domain (Reference 6-14). To see why this degradation occurs, consider the simple 1-D 
2-band filter bank shown in Figure 6-3. The operation of motion compensation can be 
implemented using an M-th band lowpass filter Q(z), along with resampling operations as 
shown in Figure 6-4a, to compensate the subbands to 2/M pixel accuracy with respect to 
the original image (d-M/2 is the shift relative to the original image). If M is odd, then 
Figure 6-4a can be redrawn using the noble identities as shown in Figure 6-4b 
(Reference 6-15). Examining this figure, we note that the anti-imaging filter Q(z) has 
now been replaced by Q(z 2 ), and this allows high frequency replicas of the signal (created 
by upsampling) to pass through. Unfortunately, this analysis does not apply to the 
important case of single pixel shifts in which M equals 2. Nonetheless, one can easily 
confirm that the residual between an unshifted signal and one that has been shifted by d 
pixels and subsequently motion compensated using Figure 6-4 is never zero. Thus, 
despite its low complexity, the concept of direct compensation in the subband domain is 
seriously flawed. 



Bits 

Out 


FIGURE 6-2. Video Encoder for In-Subband Motion Compensation. 
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FIGURE 6-3. Two-Band Analysis/Synthesis Filter Banks. 
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FIGURE 6-4. Synthesis Filter Bank With In-Subband Motion 
Compensation (M/2 Pixel Accuracy), (a) and (b) are equivalent. 


OUT-OF-LOOP MOTION COMPENSATION 


OVERVIEW 

Figure 6-5 illustrates the structure of the proposed motion compensated video 
compression scheme (References 6-2 and 6-3). Note that the motion compensation is still 
performed in the spatial domain, as in the system of Figure 6-1, but that the differencing 
operation is now performed in the subband domain (as in Figure 6-2). Performing the 
differencing in the subband domain eliminates the need for the inverse transform in the 
feedback loop and, consequently, reduces the complexity of the video encoder by as 
much as 50% (depending on how the motion compensation and estimation are 
performed). Furthermore, if the bit stream produced by the quantizer is appropriately 
designed and partitioned, then the spatial resolution of the complete video coder is 
scalable (i.e., the decoder can retain in its feedback loop only those bits associated with 
the coarser wavelet scales (assuming that a wavelet-based algorithm is used, of course) 
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and still remain perfectly synchronized with the encoder in those scales). By performing 
the actual motion compensation in the spatial domain, we overcome the problems 
inherent in the shift-varying nature of maximally decimated filter banks and transforms, 
which were discussed above. 


4 



(b) 


FIGURE 6-5. Hybrid Video Compression System Using Out-Of-Loop Compensation, 
(a) is the encoder and (b) is the decoder. Note that the motion vectors (MVs) are sent as 
side information to the decoder for correct motion compensation. 


Unfortunately, there is a major drawback to the motion compensation structure of 
Figure 6-5. To see this, one can express the frame reconstructed by the decoder as 

F k = MC- 1 {MC{F k } - F k _, + F k _j} (6-2) 

assuming again, as in Equation 6-1, that no quantization is performed and that the 
tran-sform is perfect reconstruction. Unlike the conventional motion compensation 
scheme characterized by Equation 6-1, our new method requires that the motion 
compensation operator be invertible. Previously, a motion compensated video 
compression algorithm with a structure similar to Figure 6-5 has been proposed by 
Taubman and Zakhor for use with panning (e.g., translational) motion (Reference 6-16). 
Their algorithm uses a 3-D subband decomposition and compensates for motion only 
over the support interval of the temporal filter bank, modifying the filters at the interval 
boundaries to achieve the invertibility required by Equation 6-2. In this work, we do not 
assume that the motion compensation is performed over a finite set of frames; instead, we 
require that the system continually track and update the motion estimate, regardless of the 
number of differentially predicted frames between each I-frame (e.g., a frame coded 
using only information contained within it). 
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If the original image is represented by an XxY matrix F and the motion compensated 
image is represented by the matrix F, then global pan compensation can be viewed as a 
set of matrix multiplications, i.e., 

F = V F H (6-3) 

where V compensates for vertical motion and H compensates for horizontal motion. 
Equation 6-2 requires that both V and H be invertible, and assuming this to be true, it 
becomes 


% = V-' -{V • F„ ■ H-F k _, + F k _,} ■ H-' (6-4) 

More generally, different V and H matrices can be used to shift each column and row of 
the image separately; we discuss this case in more detail below. Note that if the motion 
compensation is constrained to 1-pixel accuracy, then V and H are permutation matrices 
and their inverses are equal to their transposes. For the low complexity encoders of 
interest to us, this is a very important special case. Further, it is clear that rotational 
motion compensation can be performed within the framework of Equation 6-3 using the 
fast algorithm of Reference 6-17, and that the methods of Figures 6-1 and 6-5 should give 
equivalent results. The trade-offs associated with other forms of motion compensation 
satisfying Equation 6-2 are not so clearly defined. One possible course of action might be 
to search through all possible invertible compensation matrices V and H for each column 
and row of the image in order to find the set that minimizes the energy or entropy of the 
residual image (i.e., the image produced by subtracting the motion compensated input 
image from the previous image in Figure 6-5a). However this search process would be 
very cumbersome, and the resulting unstructured set of motion vectors would be very 
difficult to code. Consequently, the next section discusses practical ways in which motion 
compensation satisfying Equation 6-2 can be implemented and characterizes some of the 
associated trade offs. 


PRACTICAL IMPLEMENTATION 


Periodic Pan Compensation (PPC) 

The largest single component of motion in many video sequences is panning motion. 
To compensate for such motion within the framework of Equation 6-2, we force pixels 
that have panned out of the image field to instead wrap around to the opposite edge 
(Reference 6-2). This is equivalent to assuming that the finite-extent image is instead an 
infinitely periodic image (i.e., F in Figure 6-6); thus, we call it periodic pan compensation 
(PPC). If F(x,y) is the XxY image to be compensated using the integer motion vector 
(Am, An), then PPC can be implemented as follows: 

PPC{F(x,y)} = F((x - Am)mod(X),(y - An)mod(Y)) (6-5) 
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where “mod” is the standard modulo operation. Periodic pan compensation can also be 
described by Equation 6-3 with matrices 


' An.Y 


^An.Y-An ^An.An 
ly-An.Y-An ^Y-An.An _ 
^Y+An.-An ^Y+An.Y+An 
^-An.-An ®-An,Y+An 


An > 0 

An <0 


( 6 - 6 ) 


and 


H 


Am,X 


®X-Am,Am Ix-Am,X-Am 
■^Am.Am ^Am.X-Am 
®-Am,X+Am ^-Ani,-Am 
^X+Am.X+Am ®X+Am.-Am 


Am > 0 

Am < 0 


(6-7) 


where I and 0 are the identity and zero matrices, respectively, and the subscripts indicate 
the matrix dimensions. Note that the periodic extension shown in Figure 6-6 is the same 
as that used to implement circular convolution in Reference 6-18. Thus, if the subband or 
wavelet filters are circularly convolved with the image, then using PPC will not adversely 
affect the transform coding gain. To compensate for translation, a square window 
defining the image fed into the transform in Figure 6-5a is moved around the infinite 
extent image of Figure 6-6 as appropriate for the motion. Clearly, periodic pan 
compensation as shown in Figure 6-6 will perform less efficiently in the part of the 
window that extends outside the center repetition of F (e.g., the original finite-extent 
image) because there is likely to be less correlation between the wavelet coefficients 
generated by these pixels and the equivalent coefficients maintained in the feedback loop. 
We note, however, that this loss of efficiency does not accumulate with succeeding 
frames of the video sequence. After quantization and coding, an approximation of the 
region outlined by the square box in Figure 6-6 becomes the reference image. If the next 
input frame is shifted as indicated by the gray box, then only the region shown in gray 
will have a higher residual energy (and, likely, reduced coding performance) because it 
represents the area in which the two frames do not overlap. 
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FIGURE 6-6. Diagram of Periodic Pan Compensation, 
Where F is the Periodically Repeated Original Image and 

A 

F is the Compensated Image Input to the Transform. 


By making a few reasonable assumptions about the statistics of the video sequence, 
we can compare pan compensation implemented with the conventional spatial 
differencing of Figure 6-1 to that implemented using the out-of-loop approach shown in 
Figure 6-5. Because pan compensation is implemented separably in the horizontal and 
vertical directions, it suffices here to consider only a 1-D system. In addition, because the 
adverse effects of PPC are not cumulative, we need only consider the compensation of a 
single frame. Assume that the input x is a random vector in R N with 

mean = E{x} = 0 (6-8) 

and 



cr 

or g - 

or g 2 

9 

G“ 

9 9 

or a 

a a 2 


a 2 *g 2 


or cr 

. ’ 

a a 2 

a 2 a 

9 

a" 

aa 2 

or a 2 

a 2 


(6-9) 


or to put it another way, 

cov i,k = E { x 0)• x ( k )} = « |j ’ k| • ° 2 


( 6 - 10 ) 


where E{*} is the expectation operator and 0<a<l. We further assume that the previous 
input vector x also satisfies Equations 6-8 and 6-9 with the additional condition on the 
cross correlation that 


E{x(j) • x(k)} = k+An l • o 2 


( 6 - 11 ) 
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assuming that the input vector is shifted by An with respect to x . If the new input truly is 
just a shifted version of the previous input, then the assumption that x also satisfies 
Equations 6-8 through 6-11 is accurate. Equation 6-9 states that the statistical correlation 
between elements in the signal decreases as the distance between them increases. This 
common statistical image model is known to hold well over limited regions of an image, 
but we apply it broadly here in order to characterize the worst case performance of PPC 
relative to conventional pan compensation. 

The pan compensation for Figure 6-1 can also be formulated as a matrix 
multiplication similar to Equation 6-3 with 

^An,l ®An.N-l 

^N-An.N-An ^N-An.An 
^N+An.-An ^N+An.N+A 
0-An.N-l 1-An.l 

where 1 is a matrix of all ones. Note that we use here the freedom provided by 
Equation 6-1 to extend the compensated vector (or, in general, image) by repeating 
elements in regions uncovered by the motion: given our statistical model, this is the best 
course of action. The residual vector is given by 

£ = x-x (6-13) 

and the corresponding expected squared error by 

E{f} = Eje' • ej = Ej(x - x)' • (x - x)j (6-14) 

where (•)' is the transpose operation and f is simply the sum of the squared errors. 
Simplifying Equation 6-14 using the identity x* • x = trace(x ■ x 1 ) and the linearity of the 
trace and expectation operators, we find that 

E{f c } = 2N • G 2 - 2 • trace(E{V' n • x • x ( }) 

= 2N • G 2 — 2 • trace(v^ • E{x • x ( }) 

where f c is the error in the case of conventional pan compensation. Finally, applying 
Equation 6-11 and simplifying, 

E{f c (a,An)} = 2 g 2 -|^n|-y^(l-a |An| )j (6-15) 

which of course equals zero if lAnl = 0. 
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Starting with Equation 6-14 and using Equation 6-6 in place of (6.12), we can write a 
similar expression for the case in which periodic pan compensation is used, i.e., 

E { f ppc} = 2N • o 2 — 2 • tr a ce(v An .N ' E{x - x 1 }) (6-16) 

where V now performs a full circular shift of the rows of the correlation matrix. Applying 
Equation 6-11, circularly shifting, and evaluating the trace, we find that 

E{fp PC (<x,An)} = 2<r|An|-(l -a N ) (6-17) 


The difference between Equations 6-17 and 6-15 represents the loss of efficiency 
incurred by using PPC over conventional compensation and is given approximately by 


diff = 


2sL“|l-a' 4 <j 

1-a l J 


(6-18) 


for large N and reasonable values of the regional correlation coefficient a. To 
characterize the performance degradation graphically, we form the ratio 


ratio = 


N - g2 - diff = l-——(l-a |An| ) 

N-cr N • (1 - a)' > 


(6-19) 


and plot this in Figure 6-7 for N = 512 (the length of one line in a typical image) and a 
few as. This ratio basically characterizes the increase in residual energy relative to the 
energy of the original input signal, and we use it here to eliminate the dependence on a 2 . 
From the figure, we note that the penalty paid for using PPC (i.e., the reduction from 1.0) 
is not truly severe even for large values of An and a, leading us to expect that the method 
will perform well in practice. The performance of PPC relative to conventional pan 
compensation improves as the regional correlation in the statistical image model 
decreases because the conventional algorithm’s flexibility in filling the part of the frame 
uncovered by motion becomes less significant. 
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FIGURE 6-7. Difference in the Expected Error Between 
Conventional and Periodic Pan Compensation for a 
Variety of a. 


Periodic Generalized Pan Compensation (P-GPC) 

The concept of periodic pan compensation can be easily generalized to allow for 
more complex motion while retaining the same basic framework (Reference 6-3). 
Specifically, each individual row and column of the image can be separately 
compensated, as shown in Figure 6-8. This flexibility allows one to compensate for much 
more complex global motions including rotation (using the method proposed in 
Reference 6-17). In the context of P-GPC, Equation 6-5 can be rewritten as 

P - GPC{F(x, y)} = F((x - Am x ) mod(X), (y - An y ) mod(Y)) (6-20) 

where Am x and An y are the individual motion vectors corresponding to each row and 
column, respectively. Note that the analysis of PPC discussed previously is also 
applicable here because it only considers a single line of the image (an average over all 
lines can be computed for P-GPC to estimate the performance decrease for the entire 
image). 
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FIGURE 6-8. Generalized Periodic Pan Compensation for 
Horizontal (a) and Vertical (b) Motion. The dashed lines 
are the motion compensated lines and the gray areas are 
where the lines wrap around. 


Region-Based Motion Compensation 

The concepts discussed above can also be applied over specified regions within the 
image. Doing this compensates for motion locally, which can be advantageous when the 
image contains multiple, independently moving objects. But the pixel wraparound 
required to satisfy Equation 6-2 can greatly increase the residual error if the regions are 
small relative to the amount of motion. Because the amount of pixel wraparound 
accumulates with successive frames, the sizes of the motion compensation regions and 
the average amount of motion in them act to limit the number of consecutive frames that 
can be coded before the error residual begins to increase dramatically. In practice, if this 
form of region-based motion compensation is applied to an unknown image sequence, the 
encoder must closely monitor the error residual and insert an I-frame (or a portion of one 
over a region) when it grows too large. Finally, one should note that at very low bit rates 
the block-circular shifting at the decoder’s output is likely to introduce false edges into 
the resulting video. These artifacts will be present even if a transform having overlapped 
basis functions (e.g., a wavelet) is used. 


Compensating for Subpixel Motion 

So far we have only considered systems capable of compensating for integer 
amounts of video motion. While integer-accurate compensation is well suited to 
applications where complexity is critical, the energy of the residual image can often be 
further reduced by compensating for subpixel motion. To minimize computational 
complexity, we initially considered compensation based on a simple linear interpolation 
filter. Unfortunately, such filters have zeros at z = -1 in the complex plane, making stable 
inversion (a requirement in our application) impossible. Therefore, we have adopted the 
approximately allpass interpolator developed in Reference 6-19. Equation 6-20 can be 
modified to include subpixel interpolation as follows: 
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S - GPC{F(x,y)} = q p(x)t p( y )[F((x - Am x )mod(X),(y - An y )mod(Y))] (6-21) 

where q[k] represents the infinite impulse response interpolation filter and (3(-) is the sub¬ 
integer shift (i.e., p(*) e [-1/2, 1/2]) for the specified row or column. In our application of 

the allpass interpolator, we have found that a window size of 10 along with three 
iterations of Equation 6-12 in Reference 6-19 provide adequate performance. One should 
note, however, that even with these simplifications the computational complexity of this 
approach is still quite high. 


VIDEO CODING ALGORITHM 

MOTION ESTIMATION 

To minimize complexity in the video encoder, one would prefer to estimate motion 
vectors using ancillary information such as sensor gimbal movements and platform 
navigation information. Unfortunately, such information does not exist for the video 
sequences available to us. Therefore the video coding algorithm presented here (and used 
to generate the results below) also performs motion estimation, but we note that this 
added computational cost may not be germane to the final application. 

All of our motion estimates are made by comparing 128x128 regions of the current 
video frame with the same regions in the previous frame. It is necessary to actually store 
portions of the previous frame because the out-of-loop compensation structure of 
Figure 6-5 never reconstructs the actual image produced by the decoder (i.e., it only 
reconstructs the decoder’s approximation of the transform coefficients). Integer accuracy 
motion estimates for each region are then formed using FFT-based correlation between 
the frames. We selected this method of matching because it tended to provide better 
estimates than spatial searches for our video sequences. To compute the motion estimates 
for pan compensation, we use a single region centered in each video frame, while for the 
two other methods we use five different regions. Figure 6-9 shows the correlation regions 
used for the GPC method as well as those used with a region-based motion compensation 
method of the type discussed previously. For GPC, a different horizontal (vertical) 
motion vector is calculated for each row (column) by interpolating the five motion 
vectors generated by correlation in Figure 6-9a. For example, the horizontal motion 
vector for line 8 in the figure is determined by first averaging the horizontal motions of 
regions 0, 1, and 3 (the magnitude of which is indicated by the dashed arrow) and linearly 
interpolating this average with the horizontal component of the motion vector in region 4. 
This process is extrapolated outward past the center of region 4 (along the dotted line) to 
estimate the motion vector for line y. All other horizontal and vertical motion vectors are 
computed in the same way using vector components from the appropriate regions. In 
Figure 6-9a, our implementation of GPC assumes a smooth, piecewise-linear motion 
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model. We have found that smoothness is especially important in achieving good spatial 
compression if a transform having overlapped basis functions is used, a fact also noted in 
Reference 6-9. 

Motion estimates for the region-based compensation approach are calculated using 
the dashed squares shown in Figure 6-9b. In this case, the motion vector for region 0 
(mvO) is used to estimate the global panning motion, while those of regions 1 through 4 
(mvl,...,mv4) are used to estimate the motion in the associated quadrant of the image. To 
compensate the current video frame, this encoder first uses mvO to perform periodic pan 
compensate on the entire frame and then uses the differences (mvl-mvO, mv2-mv0, 
mv3-mv0, and mv4-mv0) to perform PPC in each quadrant separately. For comparison, 
we also use the same motion estimates to compensate a conventional hybrid DPCM- 
transform coder but with pixel replication to fill in the uncovered regions. 
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FIGURE 6-9. (a) Shows Motion Estimation Method for Generalized Pan Compensation; 
(b) Shows Motion Estimation for Region-Based Approach. Small dashed squares are the 
regions used in each frame for correlations. 


TRANSFORM-BASED RESIDUAL ENCODER 

To evaluate the proposed motion compensation algorithm, we use a 5-level, 2-D 
wavelet transform that has been implemented separably with 1-D 9/7 biorthogonal 
wavelets (9 and 7 tap analysis low and high pass filters, respectively) (Reference 6-20). 
Symmetric extension is applied to the wavelet filters at image boarders except as noted in 
Figure 6-10, where circular convolution is also used (Reference 6-20). For rate-distortion 
comparisons, we use the EZW algorithm to code and quantize both the I- (intra) and P- 
(predicted) frames in the sequence (Reference 6-21). In addition to achieving good rate- 
distortion performance, this algorithm generates an embedded bit stream (i.e., 
information is transmitted in order of importance) that, in a complete video coder. 
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simplifies the rate-buffer control. While functional, the video coding algorithm specified 
here has been primarily designed for the purpose of comparing the various motion 
compensation schemes previously discussed. For the results presented below, we use 
fixed bit allocations of 0.2 bpp for the first image in the sequence (the I-frame) and 0.1 
bpp for all of the subsequent difference frames (P-frames). A complete low bit-rate video 
coder might also make use of bi-directionally motion compensated predicted and/or 
interpolated frames (B-frames), and it could possibly also vary the bit allocation to 
different frames in the sequence in an effort to maintain uniform quality in the 
reconstructed video sequence. These enhancements would greatly increase the 
performance of the basic video coding algorithm described here but only at the expense 
of increased encoder complexity. 




FIGURE 6-10. (a) Energy of Difference Image (No Quantization) Versus 
Frame Number; (b) Mean Squared Error Of Reconstruction With 
Quantization Versus Frame Number. Dashed is non-motion compensated, 
solid is PPC with symmetric extension, circles are PPC with circular 
convolution, and dotted is conventional pan compensation. 


RESULTS 


The video to be used for these comparisons is an aerial sequence of a power plant 
consisting of 30 512x512 frames collected with an imaging infrared camera. In our 
comparisons, we process only one field of each frame (i.e., every other line of an image) 
to eliminate any degradation in coder performance due to the camera’s interlaced scan. 
The first and last frames of the sequence are shown in Figures 6-11 and 6-12, 
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respectively, and the results of the comparisons are plotted in Figures 6-10, 6-13, and 
6-14 for the different motion compensation methods proposed earlier. In these plots, 
“ENERGY” refers to the sum of the squared pixel values divided by the total number of 
pixels, and “MSE” refers to the mean squared error between the original image and its 
reconstruction (using the coding algorithm specified above). Studying Figure 6-10a, we 
note that PPC in the context of Figure 6-5 and conventional pan compensation in the 
context of Figure 6-1 both provide considerable reductions in residual energy over the 
uncompensated difference image (dashed line). In fact, the residual energy reductions 
achieved with any form of motion compensation are very similar. Comparisons of the 
MSEs, however, show that PPC combined with the out-of-loop structure performs far 
better than conventional pan compensation as the frame number increases (the average 
MSE is 270 versus 371). The same phenomena also occur with generalized pan 
compensation (Figure 6-13) and block-based compensation (Figure 6-14). Why? We find 
that the energy of the transformed residual image is concentrated into fewer coefficients 
when an out-of-loop video encoder is used in place of a conventional one. The poor 
energy concentration of the conventional hybrid DPCM-transform coder is a direct result 
of spatial energy spreading in the transform: i.e., the residual energy is contained in a few 
large pixels after the subtraction of Figure 6-1, but the transform spreads it out over a 
number of wavelet scales (subbands) and a number of coefficients in a given scale. Thus, 
the zerotree spatial encoder must transmit more significant coefficients, making its 
entropy coding (the zerotree root symbol plus the arithmetic encoder) less effective. Also 
plotted in Figure 6-10 is the case in which circular convolution is applied at the image 
borders in place of symmetric extension; we note that circular convolution is slightly 
better (as predicted previously) but not by very much: its average MSE is 267 versus 270 
for symmetric extension. 

Figure 6-13 compares the out-of-loop and conventional motion compensation 
methods using generalized pan compensation (periodically implemented in the first case) 
as presented earlier, with the motion estimation methodology described above. These 
results are very similar to those of Figure 6-10, with the out-of-loop compensation 
scheme offering the best rate-distortion performance (average MSE of 273 versus 337). 
In Figures 6-15 and 6-16 one also notes that by the end of the sequence, the reconstructed 
video has degraded much more severely for the conventional video coder than it has for 
the out-of-loop coder. Subpixel P-GPC (using the scheme discussed earlier) is also shown 
in Figure 6-13 (circles). Comparing this to integer P-GPC we find that the average MSE 
decreases by only three, but this is not surprising because the input image sequence 
appears to have been spatially sampled at a relatively high rate compared to its actually 
resolution. In addition, the MSE is increased by between 5 and 15 by the inverse subpixel 
interpolation filter because it is imperfect near the image borders. 
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FIGURE 6-12. Last (30th) Frame of Image Sequence. 
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FIGURE 6-13. (a) Difference Energy and (b) MSE Versus Frame Number. Solid is 
repeated from Figure 6-10, for reference, dashed is generalized periodic pan 
compensation, circles are subpixel P-GPC, and dotted is non-periodic GPC implemented 
within a conventional hybrid DPCM-transform framework (i.e., Figure 6-1). 




FIGURE 6-14. (a) Difference Energy and (b) MSE Versus Frame Number. Solid line is 
repeated from Figure 6-12, dashed is region-based motion compensation using PPC and 
the “out-of-loop” framework (i.e., Figure 6-5), dotted is region based using the 
conventional framework. 
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FIGURE 6-16. Frame 30 Using Conventional Approach With GPC. 
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Comparisons for the block-based motion compensation approach (discussed above) 
are illustrated objectively by Figure 6-14 and subjectively by Figures 6-17 and 6-18. 
Again, we note that the energy residuals for the two methods are almost identical while 
the MSE of the out-of-loop compensation approach is better (276 versus 287). For this 
sequence, nothing is gained by using a block-based approach over simple pan 
compensation, indicating that most of the motion in the sequence is translational. Note 
that at the bit rates used here, some artifacts do occur in the reconstructed image sequence 
using both the out-of-loop and conventional approaches. Unfortunately, these artifacts 
tend to be most noticeable for the out-of-loop motion compensation. This happens 
because the discontinuities introduced by block-based motion compensation tend to be 
smoothed over by the coding process in the conventional approach, while for the out-of- 
loop method they are introduced directly into the reconstructed video sequence. 

The results presented thus far have shown that out-of-loop motion compensation 
achieves rate-distortion performance superior to that of conventional hybrid DPCM- 
transform coding when used in a wavelet-based, fixed rate system. While this is 
excellent, we are still very interested in the encoding complexity: reducing this 
complexity is the key to using interframe coding techniques in a variety of remote video 
applications. With our aerial video sequence, we found above that simple pan 
compensation extracted most of the interframe redundancy. Thus we compare here the 
complexities of the out-of-loop and the conventional approaches only within the context 
of pan compensation. To perform these comparisons, we have timed the execution of 
each algorithm on a Sun Sparc 10 computer; and to analyze the effects of algorithm 
optimization, we have run each case twice: once when compiled with no runtime 
optimization and next when compiled with the maximum amount of optimization. 
Without optimization, the new encoder takes 651 seconds to process the 30 frames in the 
sequence, versus 899 seconds for the conventional encoder, resulting in a 27.6% speed 
increase. Compiler optimization reduces the encoding times to 388 seconds and 481 
seconds for the new and conventional algorithms, respectively—a 19.3% speed increase. 
Note that this speed increase is achieved despite our use of two fairly large discrete 
Fourier transforms (256x256 with frequency domain interpolation) to estimate image 
motion for each frame. If a more efficient estimation technique were used, the speed 
increases would be much larger. Decoder times for the new and conventional algorithms 
are 401 seconds (212 seconds with optimization) and 404 seconds (217 with 
optimization), respectively. Thus, the decoding complexities of the two algorithms are 
essentially identical, as expected. 
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FIGURE 6-18. Frame 30 Using Conventional Approach With Blocks. 
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MAINTAINING ROBUSTNESS TO 
TRANSMISSION ERRORS 


In Chapter 5 we introduced an embedded compression algorithm that is inherently 
robust to transmission errors. Unfortunately, this algorithm only extracts the redundancy 
within single video frames (i.e., it does not exploit in any way the large correlation 
between adjacent temporal frames in a video sequence). By exploiting such temporal 
correlation, one can greatly improve compression performance but at the expense of 
transmission error robustness. For example, within the compression frameworks of 
Figures 6-1 and 6-5, a single transmission error can cause catastrophic error propagation 
because it de-synchronizes the feedback loops of the encoder and decoder. 
Conventionally, the only way to recover from this situation is to send I-frame (intraframe 
coded) updates at regular intervals. This has the effect of reducing the compression 
system’s rate-distortion performance. We propose here an alternative to 1-frame updates 
that is highly compatible with the REZW still-frame coder discussed in Chapter 5. 
Figure 6-19 illustrates the proposed robust real-time video compression algorithm where 
the “embedded coder” and “embedded decoder” blocks incorporate our basic REZW 
partitioning approach. To speed up the encoding process, the decoder’s wavelet 
coefficient approximations are never directly calculated by the encoder; rather, the error 
residual after encoding is simply added to or subtracted from (depending on the sign of 
the predicted coefficient) the actual predicted coefficient’s value (i.e., after subtraction) to 
calculate these approximations. The key to achieving robust transmission is the use of 
“leaky” prediction. If the weighting factor W is set equal to 1, then we have conventional 
predictive coding. If, on the other hand, it is selected to be less than 1, then we allow 
“leakage” of the original image into the residual. This leakage allows the decoder to 
forget coefficient errors that have been introduced into its frame buffer (by faulty 
transmission or other means). Thus, if an error occurs in the transmitted bit stream, 
REZW encoding ensures that the error does not propagate spatially, while leaky 
prediction ensures that is does not propagate in time. Note that the proposed robust video 
compression algorithm is also fully compatible with the out-of-loop motion compensation 
illustrated by Figure 6-5—one need only add the appropriate out-of-loop (OOL) 
compensation before the wavelet transform in Figure 6-14a and after the inverse wavelet 
transform in Figure 6-14b. 
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FIGURE 6-19. Robust Video Compression Using Leaky Prediction, 
(a) is encoder and (b) is decoder. 


CONCLUSION 


We have presented here a new approach to motion compensation that reduces the 
complexity of the video encoder by eliminating the inverse transform operation. If integer 
pixel compensation is desired and the motion vectors are generated by sources external to 
the encoder (i.e., inertial sensors and gimbal angles), then the complexity reduction 
achieved by using the new out-of-loop compensation scheme over the conventional 
hybrid DPCM-transform approaches is 50%. Even with the relatively inefficient (in terms 
of computational complexity) DFT-based motion estimation algorithms used here, actual 
complexity reductions of between 19 and 27% have been achieved. Furthermore, the 
decoder complexity is not increased by the proposed motion compensation scheme. In 
addition, comparisons using a representative video sequence show that the new approach 
has better rate-distortion performance than the conventional method for a variety of 
specific implementations. For remotely sensed aerial video, periodic pan compensation • 
appears to be an adequate solution. Nonetheless, this basic methodology still performs as 
well as or better than the conventional hybrid DPCM-transform approach, even when 
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used with more sophisticated motion compensation schemes. Finally, we have shown 
how a video compression system can be made robust to transmission errors using leaky 
temporal prediction. In summary, out-of-loop motion compensation offers a significant 
reduction in encoder complexity with no corresponding increase in decoder complexity, 
while at the same time facilitating spatial scalability and error robustness. 
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CHAPTER 7. 

REAL-TIME INTRAFRAME ENCODING 


PROCESSOR 

The task of implementing a real-time intraframe encoding of a 512x240 image is 
performed by a TI DSP, the TMS320C80. This single chip actually contains 
5 processors: 4 fixed point Parallel Processors (PPs), and 1 floating point Master 
Processor (MP). There is a single 64-bit-wide data bus that allows access to the outside 
world. The Transfer Controller (TC), a peripheral inside of the chip, implements all 
transactions of data to and from the chip. Fifty kilobytes of memory internal to the 
TMS320C80 are divided into 25 2-Kbyte RAMs (random access memories). There are 
three Data RAMs, one Parameter RAM, and one instruction cache per PP. The MP has 
one Parameter RAM, two Data Cache RAMs, and two Instruction Cache RAMs. 


HARDWARE 

We decided to use the Spectrum Signal Processing PCIC80 board running at 
40 megahertz. This board provided a single TMS320C80 with 4 megabytes of DRAM, 
4 megabytes of SDRAM, 4 megabytes of VRAM for display, a daughter board interface 
that can accommodate a video acquisition daughter board, and a personal computer (PC) 
host interface. 


TASK PARTITIONING MP VERSUS PP 

Early on we decided that the PPs would not be involved in the data movement and 
coordination but would yield this task to the MP. The MP would therefore be in charge 
of using the TC to move data on and off the chip, and also would signal the PPs when 
data were available. The MP would also handle all of the communication with the PC 
host. The PPs are well suited to pixel manipulation, and the MP is an excellent overseer. 
Choosing this partitioning allowed us to use the exact same code on all the PPs. 


DEVELOPMENT ENVIRONMENT 

Our current development environment is hosted on the PC platform. A JTAG 
emulator directly connects to the TMS320C80 to download and debug code. For our 
demonstrations, we have a PC program running a user interface that downloads and starts 
the program over the PCI interface to the card, which negates the need for the emulator 
during demonstrations. 
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HIGH LEVEL DESCRIPTION OF ROUTINES USED 
FOR INTRAFRAME WAVELET CODING 


Master Processor (MP) 

Mp.c is the only routine that runs on the MP. This program’s main tasks are to start 
the PPs, to move data to and from the PPs, to coordinate PPs to perform their selected 
operations, to display raw images and informational text, and to communicate with the 
PC host. The functions are enumerated in detail here: 

1. Copy PP code to SDRAM from DRAM, start PPs executing all the same 
code. 

2. Initialize and kick off the task that will do all the work, and hand over to this 
task. 

3. Set up the video display and acquisition of video. 

4. Initialize all the packet transfer tables. 

5. Read an image. 

6. Read the current compression ratio. 

7. Display the image. 

8. Perform 3 1/2 level wavelet decomposition on the image. 

9. Send the coefficients to the PPs for maximum value calculation. 

10. Read all PPs’ local maximum values, calculate a global maximum value, and 
write it back to the PPs. 

11. Send groups of six subblocks of the coefficient to each PP and signal them to 
start. 

12. Wait for the resulting bitstream. 

13. Write the bitstream to DRAM. 

14. Append the dynamic values to the bitstream. 

15. Tell the PC host that computation is complete for this frame of video. 

16. Display the frame rate and processing time. 
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Parallel Processors 

Main.c is the routine that initializes all the pointers, coordinates the decomposition 
of the image, and calls the function to form the bitstream. 

Three routines are used to perform the embedded wavelet decomposition: 
subdec_vert, subdecjioriz, and calc_n_sub_mean. The subdec_vert routine performs the 
vertical decomposition and the subdecjhoriz routine performs the horizontal 
decomposition. The calc_n_sub_mean routine performs three functions: it calculates the 
current image's mean; it subtracts off the previous image’s mean from the current image 
then converts the result to a 16 bit number; and then it shifts the result up 2 places. 

Seventeen routines are used to perform the scanning of the embedded zerotrees and 
to generate the compressed bitstream. 

The main routine that coordinates this process is p_code. It takes care of the 
maximum value calculation and the calling of all the routines necessary to perform the 
bitstream encoding. 

The zerotree formation is done by input_block, which takes a block of coefficients in 
the in-place structure and writes them into another structure in zerotree format. Another 
routine called comp_ztr forms a zerotree root structure that aids in scanning of the 
zerotrees. 

The actual scanning of the zerotrees is done in code_dlist2. Depending upon what 
information this scanning finds, it either calls comp_dbits, new_dbits_nwo, 
new_dbits_wo, or new_dbits2. A subordinate pass is run in subpass, which gives an 
extra refinement to coefficients that have already been chosen as significant. 

The arithmetic coder is implemented in start_model, start_encoding, start_outputing, 
do_syms, encode_symbol, bit_plus_follow, and update_model. 

The output of the encoder is a series of bits that are packed and put into a structure 
by new_output_bit. 
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Speedup Methods 

Several methods that are used to speed up the code and improve its efficiency are 
enumerated below, along with their functions. 

MP. 

Asynchronous handling of subblocks and bitstream blocks 
Frequently used variables stored in internal memory 
Predefine Packet transfer tables 
Poll for images 

Overlap of different PPs’ executions 

PP. (See Figure 7-1.) 

Horizontal and vertical decomposition rewritten in assembly language with parallel 
instructions 

Code written with cache in mind 
Caching of symbols 
Optimized C code 

Subordinate pass rewritten in assembly language 

A stored table of significant coefficients, so that searching is not required 

Zerotree pruning performed 

PP code execution from fast SDRAM 

Only internal memory used for variables and stack (no heap) 

Fast output_bit() routine written 
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FIGURE 7-1. PP Call Tree 
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The files used for this project are: 


Master Processor programs 


mp.c 


Parallel Processor programs 

Include files 


arith.h 

modlp.h 

C language files 


codenew.c 

- 


j_enc.c 

main.c 

modlp.c 

Assembly language files 


bitebits.s 

hvenc.s 

mean2.s 

subpass.s 

ztr.s 


Support files 

Batch files for making executables 

cc.bat - makes MP code 
ccpp.bat - makes PP code 

Linker command files 


pcic80.cmd - linker file for MP link 
pcic80a.cmd - linker file for PP link 
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CHAPTER 8. 

REAL-TIME INTRAFRAME DECODING 


The process of decoding is basically the inverse of encoding, where the bitstream is 
changed back into symbols, a zero-tree is formed, and then the coefficients are inverse 
transformed into their pixel representation. 


HIGH LEVEL DESCRIPTION OF ROUTINES USED FOR 
INTRAFRAME WAVELET DECODING 


Master Processor (MP) 

Mp.c is the only routine that runs on the MP. This program's main tasks are to start 
the parallel processors (PPs), move data to and from the PPs, coordinate PPs to perform 
their selected operations, to display images and informational text, and to communicate 
with the personal computer (PC) host. The functions are enumerated in detail here: 

1. Copy PP code to SDRAM from DRAM, start PPs executing all the same 
code. 

2. Initialize and kick off the task that will do all the work, and handover to this 
task. 

3. Set up the video display. 

4. Initialize all the packet transfer tables. 

5. Wait for a packet to decode. 

6. Read the current mean, maximum value, FIRST value, and compression 
ratio. 

7. Send a packet of the bitstream to each PP. 

8. Wait for resulting coefficients. 

9. Write coefficients to memory. 

10. Perform 3 1/2 level wavelet synthesis on the coefficients. 

11. Display image. 

12. Display frame rate and processing time. 
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Parallel Processors 

Main.c is the routine that initializes all the pointers, calls the function to decode the 
bitstream, and coordinates the wavelet synthesis of the image. 

Eighteen routines are used to perform the decoding of the bitstream into zero-trees. 

The main routine that coordinates this process is p_dec. It coordinates the input of 
the bitstream, the initialization of the arithmetic decoder, and the calling of the routine 
that will decode the bitstream (fast_arith_decode2). 

Fast_arith_decode2 executes one pass through all six of the zero-trees. Each pass 
can affect single or multiple bitplanes: therefore this routine must be called multiple 
times until all the input symbols have been decoded. It traverses the zero-trees in parallel 
from top to bottom, pruning along the way. To turn a symbol into a coefficient it calls 
new_dec_dbits_ri or new_dec_dbits3_ri, depending on what level of the zero-tree it is 
parsing. The symbols are pulled from a small cache of symbols that have been decoded 
so that every time a symbol is needed we do not incur a cache miss to execute the code to 
fetch a symbol. The cache is small so that we do not waste time caching symbols that 
will never be used because we change our number of symbols in the arithmetic decoder 
model. It is impossible to tell a priori how many symbols we will need for a pass, 
because it is dependent upon the symbols that are read in and where they are needed in 
the tree. 

We know how many symbols must be read in only on the last level of the tree (if it 
has not been pruned off). Pre_dbits2 takes care of reading in the correct number of 
symbols for this level. 

The pass down function is performed in new_dec_dbits_nri and possibly in 
dec_dbits_fix. 

The subband pass is performed in do_subdec. 

The zero-tree to in-place formation is done by form_img2, which takes a single 
block of coefficients in the zero-tree structure and writes them into another structure in 
the in-place format. 

The arithmetic decoder is implemented in start_model, start_decoding, 
start_inputing_bits, quick_syms, new_decode_symbol, and new_update_model. 

Bits are read out of the incoming bitstream by new_input_bit, one bit at a time. 

Three routines are used to perform the embedded wavelet synthesis: subsyn_vert, 
subsyn_horiz, and add_n_conv8. The subsyn_vert routine performs the vertical 
synthesis, and the subsyn_horiz routine performs the horizontal synthesis. The 
add_n_conv8 routine performs three functions: it adds the previous image's mean to all 
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the values, it limits the result to a 10 bit number, and then it shifts the result down 
2 places (making it into an 8-bit number). 

Speedup Methods 

Several methods that are used to speed up the code and improve its efficiency are 
enumerated below. 

MP. 

Asynchronous handling of subblocks and bitstream blocks 
Frequently used variables stored in internal memory 
Predefine Packet transfer tables 
Overlap different PPs’ executions 
PP. (See Figure 8-1.) 

Horizontal and vertical synthesis rewritten in assembly language with parallel 
instructions 

Code written with cache in mind 
Caching of symbols 
Optimized C code 

Subband pass rewritten in assembly language 

Zerotree pruning performed 

PP code execution from fast SDRAM 

Only internal memory used for variables and stack (no heap) 

Bit input routine written in assembly language 
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FIGURE 8-1. PP Decode Call Tree. 
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The files used for this project are: 


Master Processor programs 

mp.c 

Parallel Processor programs 

Include files 

arith.h 

modlp.h 

C language files 

form.c 


j_enc.c 


main.c 


modlp.c 


faster_ar.c 

Assembly language files 

addconv.s 

decbits.s 

hvdec.s 

quick.s 

sum.s 


Support files 


Batch files for making executables 

cc.bat - makes MP code 
ccpp.bat - makes PP code 

Linker command files 

pcic80.cmd - linker file for MP link 
pcic80a.cmd - linker file for PP link 
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CHAPTER 9. 

REAL-TIME INTERFRAME ENCODING 


For the purpose of being concise, only the differences between the Interframe and 
Intraframe methods will be discussed for both the encoding and decoding methods. It 
can be assumed that the other routines have not changed, and still perform the same 
functions. 

Figure 9-1 is a block diagram outlining the differences between the Intraframe and 
the Interframe methods of encoding. The Interframe method adds a feedforward step and 
a feedback step to the process. 



Wavelet 

Transform 



w 

w 

W 


W 


(a) Intraframe. 



Step 1 


Step 2 


(b) Interframe. 
FIGURE 9-1. 
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The differences will be outlined by the three different steps that have been 
highlighted in the diagram. Step 1 is used to determine the coefficients that have been 
received by the decoder. This method is more efficient than doing the inverse 
quantization and inverse wavelet transform to determine what coefficients were coded. 
The diff2 routine performs the special summation. This special summation was required 
to correctly determine the actual value that was sent to the decoder. 

Step 2 is our feedback loop to “leak” out the energy from the coefficients that were 
already sent. The routine that implements this loop is the suml routine. 

Step 3 is where the new coefficients are compared with a delayed version of what 
has already been coded and sent to the decoder. This is implemented in the diff 1 routine. 

Figure 9-2 shows the PP encode call tree. 




new_output_ 


FIGURE 9-2. PP Encode Call Tree. 
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CHAPTER 10. 

USERS MANUAL FOR REAL-TIME SOFTWARE 


This section describes how to run the real-time software on the PC host platform. 

The software on the PC is set up in a client/server relationship. The software that 
runs on the encode PC is the server, and the software that runs on the decode PC is the 
client. The client initiates requests to the server for a frame of encoded data and 
designates which program will run at the beginning of a connection. 

ENCODE 

The name of the software on the encode side is Gserv. This software needs no user 
interface: it was meant to be started and then left with only the client controlling it. There 
are no buttons or pull downs on its menu. Figure 10-1 shows the window for this 
program. 

To use the Gserv program, simply double click on the Gserv icon on the desktop. A 
camera must be attached to the red input of the PCIC80 video input card, and optionally a 
multisync monitor may be attached to the PCIC80 display output. No images will 
display on the monitor until the Gclient program (running on the decode PC) is invoked 
and the Connect button is pushed. 



FIGURE 10-1. Gserv Window for Encode. 
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DECODE 

The name of the software on the decode side is Gclient. This software has a fairly 
detailed user interface: the user can select from four different versions of C80 code, select 
one of five different compression ratios, throttle the bandwidth of the communication 
channel (slow the encoder down), and select the server's IP address. Statistics of the 
software in the form of frame rate, frame transfer time, number of total frames 
transferred, plus the number of lost frames are all being displayed. Figure 10-2 shows the 
window for this program. 



FIGURE 10-2. Gclient Window for Decode. 

To use the Gclient program, simply double click on the Gclient icon on the desktop. 
You must then select the IP address of the encoder PC, either from the list or by typing it. 
You then choose which file you would like to run (different files are run on both the 
encoder and the decoder). You may choose the compression ratio that you desire and 
then click Connect. You cannot change the compression ratio or the file selection while 
you are connected: you must first disconnect, then make your choice of file or 
compression ratio, then connect again. 


122 


























NAWCWD TP 8442 


CHAPTER 11. 

CONCLUSIONS AND FUTURE WORK 


In the course of this project, we have carefully studied the application of video 
compression to the military remote sensing problem. After considering a variety of 
algorithms, we ultimately concluded that a wavelet-based embedded coding scheme was 
the best choice for these military applications because of its ability to produce 
progressively decodable, compressed representations of images while still achieving 
excellent rate-distortion performance (i.e., minimal distortion for a given compression 
ratio). Starting with the EZW compression algorithm, we first modified it to add 
robustness to transmission errors. This robustness allows our algorithm to gracefully 
degrade in the presence of uncorrected transmission errors—a feature not shared by 
existing standardized algorithms such as MPEG and JPEG. To overcome the major 
drawback of embedded coding—its slow execution speeds—we have developed parallel 
implementations and also introduced the algorithmic concepts of adaptive embedding and 
cache-based zerotree processing. From our studies on parallelization, we determined the 
optimal parallelization of the wavelet transform for an array of parallel processors with 
given communication and execution speeds. We have also developed and applied for a 
patent on parallel SIMD implementations of an EZW encoder and decoder. Our work on 
algorithmic speedups led to the development of adaptive embedding and cache-based 
zerotree processing. These topics are not covered in this technical publication because of 
space considerations, but a paper on them is scheduled to appear in the December 1999 
IEEE Transactions on Image Processing. The final cornerstone of our real-time 
embedded implementation is the concept of entirely in-place wavelet coefficient 
calculation and coding, for which a patent application has been submitted. The use of in- 
place transformation and coding greatly reduces the amount of memory required to 
perform embedded encoding and decoding without negatively affecting the rate- 
distortion performance of the complete system. 

Because the major focus of our research has been to develop compression algorithms 
that function well in the real world (with its unstable, error prone communications 
channels and strict space and power constraints), we felt that it was important to actually 
demonstrate our algorithms using real-time hardware. To accomplish this, we ported our 
algorithms to the TMS320C80 processor and developed a Transmission Control 
Protocol/Internet Protocol (TCP/IP) packetization system to send real-time video across 
the Internet. While transmission of compressed video via a military radio would have 
made for a better demonstration, we did not have access to such equipment nor did we 
have the funds to purchase such equipment. The Internet demonstration is nonetheless 
valuable because ATM transmission channels such as the Internet represent the future of 
communications. Our real-time demonstration system uses adaptive embedding, cache- 
based zerotree processing, and in-place coding to achieve high speed throughput with 
minimal memory requirements. It also uses our robust EZW intraframe compression 
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along with “leaky” interframe predictive coding. Furthermore, the software is configured 
so that the operator viewing the decoded video can adjust the compression ratio and turn 
features (i.e., interframe compression) on or off as desired. 

To evaluate the success of an Office of Naval Research (ONR) funded research 
project in today’s environment, one must look at technology transitions. The Data 
Compression Project has been extremely successful in this regard—the next generation 
tactical Tomahawk has selected our compression algorithm for its baseline BDI 
transmission system. Because the demands of their application are not severe (they are 
transmitting only 1 image over a slow SATCOM link), they have selected a fairly basic 
version of our algorithm.—EZW compression with in-place coefficient calculation and 
coding—for decreased memory usage and more predictable execution speed. Another 
potential transition we have pursued is the Rockwell/Collins Surgical Strike datalink 
program. To this end, we have ported our algorithm to the same dual TMS320C80 
processor card used in their program along with an identical video capture daughterboard. 
Thus, by just loading our software into their system, we should be able to test our 
algorithm with their communications link. Because of Surgical Strike’s budget 
constraints, however, we have not as yet been able to try out our compression algorithms 
with their system. We hope that an opportunity will arise at some future date. 

Despite our many successful accomplishments during the last 4 years, there are a 
number of things that we could not do because of time limitations and financial 
constraints. For example, we were unable to add out-of-loop motion compensation 
(discussed in Chapter 6) to our real-time demonstration software. Such compensation 
can dramatically improve compression performance in scenes with large camera or 
platform motions. We also did not have the opportunity to optimize our combined 
REZW/leaky predictive video coding algorithm for different packet and/or bit error rates. 
The tables generated by this optimization could then be used to dynamically control 
leakage and partitioning parameters to guarantee the best possible decoded image, given 
the current channel noise state. Other topics we hope to address in our future research 
include real-time region-of-interest compression, list Viterbi convolutional decoders for 
even better error resilience, and embedded algorithms that are optimized to run on VLIW 
(very long instruction word) processors such as TI’s C6X series. 

In summary, we are very pleased with the results that we have achieved in the course 
of this project. We have made significant scientific contributions to the state-of-the-art in 
embedded image compression, and we have also fielded a system that demonstrates the 
real-time capabilities of our research results. Probably more important than our four 
journal papers and two patents, however, is the fact that we have transitioned technology 
developed here with ONR 6.2 funding directly into a weapons system. Although there is 
more work left to do in this area, we believe that, by virtually any measure, this project 
has been highly successful. We are very satisfied with what we have accomplished. 


124 



NAWCWD TP 8442 


APPENDIX A 

AERIAL PLATFORM ANALYSIS 
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(a) Lift coefficient comparison of PANAIR solutions and Navier- 
Stokes solutions at sea level with turbulent boundary layer. 
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(b) Drag coefficient comparison of PAN AIR solutions and Navier- 
Stokes solutions at sea level with turbulent boundary layer. 

FIGURE A-l. Force and Moment Comparisons of the PANAIR, With Additional 
Viscous Drag From VSAERO, and Navier-Stokes Solutions. The solutions are for Mach 
0.09 at sea level. All solutions presented in this appendix are without vertical fins or 
actuator disk. 
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(c) Pitching moment coefficient comparison of PANAIR solutions and 
Navier-Stokes solutions at sea level with turbulent boundary layer. 
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(d) Lift-to-drag ratio comparison of PANAIR solutions and Navier- 
Stokes solutions at sea level with turbulent boundary layer. 

FIGURE A-l. (Contd.) 
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(a) The grid point distributions in the symmetry plane and about the canard and wing. 


FIGURE A-2. The Grid Point Distributions in the Symmetry Plane and About the Canard 
and Wing. Mach contours on the upper surface of the canard for laminar and turbulent 
boundary layer specifications at 0-degree angle of attack and sea level altitude. 
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(b) Laminar boundary layer, Mach contour distribution about the canard. 


FIGURE A-2(Contd.) 
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(c) Turbulent boundary layer, Mach contour distribution about the canard. 

FIGURE A-2. (Contd.) 
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(a) Turbulent boundary layer, Mach contour at the plane of symmetry. 

FIGURE A-3. Mach and Pressure Contours at the Plane of Symmetry for 0-Degree Angle 
of Attack at Sea Level With a Turbulent Boundary Layer. 
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(b) Turbulent boundary layer, pressure contour at the plane of symmetry. 


FIGURE A-3. (Contd.) 
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(a) Viscous and pressure contributions to drag with a turbulent boundary 
layer at sea level, without the actuator disk. 
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(b) Viscous and pressure contributions to drag with a turbulent boundary 
layer at sea level, with the actuator disk. 

FIGURE A-4. Pressure and Viscous Contributions to the Drag Coefficient With 
and Without the Propeller. 
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(c) Viscous drag contributions with a turbulent boundary layer at sea level, 
with and without the actuator cisk. 
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(d) Pressure drag contributions with a turbulent boundary layer at sea level, 
with and without the actuator disk. 




FIGURE A-4 (Contd.) 
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(e) Viscous drag contributions with a turbulent boundary layer at sea level 
and 5,000-foot altitude, without the actuator cisk. 
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(f) Pressure drag contributions with a turbulent boundary layer at sea level 
and 5,000-foot altitude, without the actuator cisk. 
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(a) Lift coefficient at sea level and 5,0000-foot altitude. 
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(b) Drag coefficient at sea level and 5,000-foot altitude. 

FIGURE A-5. Force and Moment Solution Comparisons for Sea Level and 
5,000-Foot Altitudes. 
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(c) Pitching moment coefficient at sea level and 5,000-foot altitude. 
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(d) Lift-to-drag ratio at sea level and 5,000-foot altitude. 
FIGURE A-5. (Contd.) 
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(a) Lift coefficient with laminar and turbulent boundary layers. 


o 

O 


c 

'o 

it= 


a> 

o 


O 


a> 

2 

a 



Angle-of-Attack, Alpha, Deg 


(b) Drag coefficient with laminar and turbulent boundary layers. 


FIGURE A-6. Comparison of Forces and Moments With Laminar and Turbulent 
Boundary Layers at Sea Level and 5,000-Foot Altitudes. 
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Angle-of-Attack, Alpha, Deg 


c) Pitching moment coefficient with laminar and turbulent boundary layers. 
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(d) Lift-to-drag ratio with laminar and turbulent boundary layers. 
FIGURE A-6. (Contd.) 
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APPENDIX B 

SOURCE CODE LISTING FOR REAL-TIME 
COMPRESSION/DECOMPRESSION SYSTEM 
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/* DECLARATIONS FOR ARITHMETIC CODING AND DECODING */ 

/* Size of arithmetic code values */ 


#define Code_jvalue_bits 9 /* Number of bits in a code value */ 

typedef short code^value; /* Type of arithmetic code value */ 

#define Top__value (((long) l«Code_value_bits) -1) /* Largest code val */ 

/* Half and Quarter points in code value range */ 


#define First_qtr 
#define Half 
#define Third_qtr 


(Top__value / 4+1) 

(2*First_qtr) 

(3*First_qtr) 


/* Points after first quarter */ 
/* Points after first half */ 

/* Points after third quarter */ 
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/■k**************************************************************** ************ 
** 

** bitebits.s (PP Program) 

** 

** Written by Jim Witham, Code 472300D, 939-3599 
* * 

** File that contains the following assembly language subroutines: 

** 

* * $ ne w_db i t s_jwo 

* * $ ne w__dbi t s_nwo 

** $new_dbits2 

** $do_syms 

* * $ encode^symbol 

** I_DIV_JW 

* * $ update_model 

** $bitjplus_f ollow 

* * $ new_output_bi t 

** 

*****************************************************************************/ 

.global $new_dbits__wo 
.global $new_dbits_nwo 

.global $new_dbits2 

.global $output_bit 
.global $do_syms 
.global $update__model 
. global $bitjplus_follow 

.global $T_BYTES 
.global $byte_stream 
.global $STOP 

.global $bit_index 

.global $ztl 
.global $THRESH 
.global $stats__flag 
.global $char_to_index 
.global $stats_val 
.global $TMASK 
.global $BITE 
.global $SHFTJDN 

.global $update_model 
.global $encode_symbol 

.global $new_output_bit 

.global $list 
.global $list_index 

.global $pruned__children 

.global $getaway_address 
.global $quick_getaway 

.global $index_to_char 
.global $No_of_symbols 
.global $bits__to_f ollow 

.global $high 
.global $freq 
.global $low 
.global $ cum_freq 

.global $bit__plus_follow 


/signed short pointer sh *(xba + $ztl) 

/signed int sw *(xba + $THRESH) 

/unsigned char pointer * (xba + $stats_flag) 
/unsigned char pointer &* (xba + $char_to_index) 
/signed short pointer *(xba + $stats_val) 
/unsigned short uh * (xba + $TMASK) 

/signed int sw *(xba + $BITE) 

/signed int sw *(xba + $SHFT__DN) 
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.global $sym 1 _array 
.global $sym_index 

.global $stophere 


tempdl 

. set 

dO 

stats__flag 

.set 

dl 

stats_val 

. set 

d2 

tempd2 

. set 

d3 

tflg 

. set 

d4 

tmask 

. set 

d5 

tempd3 

. set 

d6 

t 

. set 

d7 

sym 

. set 

al2 

maxsyms 

. set 

240 


.align 512 

/*************★**★***********************************★**★*********** 


* * 

* Function : $new_dbits_wo * 

* Args : m ,cl,c3,s * 

* Passed in :(dl,d2,d3,d4) * 

* * 

* Description: * 

* m - the current index of this coefficient * 

* * 

* cl - the relative index of the first child * 

* * 

* c3 - the relative index of the third child * 

* ★ 

* s - the subblock that we are in * 

* * 


$new_dbits_wo will be called to scan a coefficient that is * 
on the second or third tier of the tree. The coefficient* 
has already been checked previously for being significant* 
on a previous bit-plane, now it is check against the * 

current bit-plane threshold for significance, or pass * 


* down. * 

* * 

* Return Values: * 

* None * 

* * 


*******************************************************************/ 

$new_dbits_wo: 

dO = &* (sp —= 28) 

*(sp + 16) =w iprs 

* (sp + 12) =w a4 
|| * (sp + 8) —w d6 

*(sp + 4) =w al2 
|| *(sp + 0) =w d7 

*(sp + 24) - d4 ;save s onto stack 

d5 = dl 

al *=uw * (xba + $stats_yal) ;*{al) = stats_val [s*256] 

a2 =uw * (xba + $stats_flag) 

a3 =uw * (xba + $ztl) ;*(a3 + [xO]) = ztl[] 

d6 * d4 « 9 ; 256 elements each 2 bytes long times s 

(« 9 = * 512) 

al = al + d6 
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d6 = d4 «8 
{« 8 = * 256) 

a2 = a2 + d6 

d6 = d4 « 7 
a3 = a3 + d6 

d5 = d5 + (d4«8) 


; 256 elements each 1 byte long times s 


;ztl[s*64] (where ztl[] is a short) 


;m = m + s*256 


*(sp + 20) =uh d5 


aO 

= a2 

/ *(aO) 


= stats__flag[s*256] 


xO 

= dl 

;x0 = m 




xl 

= d3 

/*(a2 + 

Exl]) 

= stats_flag[c3] 


x2 

= d3 + 1 

/*(a2 + 

[x2)) 

* stats flag[c4 = c3 

+ 13 

a2 

= a2 + d2 

/*(a2) 


= stats_flag[s*256 + 

cl] 



/*(a2 + 

[1]) 

= stats_flag[c2 = cl 

+ 13 


; *(al + 
;*(aO + 
; *(a2) 
;*(a2 + 
;*(a2 + 
;*(a2 + 


[xO]) = statsjval[s*256 + m] 

[xO]) = stats_flag[s*256 + m] 

= stats_flag[s*256 + cl] 

[1]) = stats_flag[s*256 + c2] (c2 = cl + 1) 

[xl]) = stats_flag[s*256 + c3] 

[x2]) = stats flag[s*256 + c4] (c4 = = c3 + 1) 


tmask =uh * (xba + $ TMASK) 

tflg =sh *(a3 + [xO]) ;fetch ztl[m + s*64] 

al5 = tflg&tmask 

tflg =1|| tflg =[ne] a!5 ;calculate tflg 


tflg = tflg - 0 
br = [z ] no_change 
stats_val =sh * (al + [xO]) 
tempd3 = |stats_val| 

tempdl = tflg « 1 
a4 = tflg 

call = mod^flags 
d4 = *(sp + 24) 
tempd2 =ub *(a2) 

tflg = a4 

no_change: 

t = tempd3 & tmask 
tempdl =sw * (xba + $SHFT_DN) 
tempdl = -tempdl 
t = t »u -tempdl 


/fetch stats_val[m] 

/take abs(stats_val[m]) 

/reload s from stack into d4 

/ form TMASK&abs (stats_val [m]) 

/fetch SHFT_DN 

/form (TMASK&abs (stats_val [m])) »SHFTJDN 


/compute sym 

tempdl =sw *(xba + $BITE) 
tempdl = %tempdl 
t = t - 0 
tempdl =[eq] al5 
stats_val = stats_val - 0 
tempdl = [le] al5 

finish_up: 

tempd2 = (~tflg)&l 
tempd2 « tempd2 + t 
tempd2 = tempd2 + tempdl 

xl = tempd2 
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t = t - 0 
br =[le] L68 
nop 
nop 

;if (t>0) 

; stats_val[m] » (abs (stats_val [m])-((t*THRESH) / (1« (BITE-1) ) +THRESH/ (2« (BITE-1))) ) 
;/* or the equivalent from Chuck's code */ 

; stats_val [m] = (abs (stats_val [m])-((t*THRESH)»(BITE-1)) + (THRESH»BITE)) ; 

tempdl =sw * (xba + $BITE) 
tempd2 =uh *(xba + $THRESH) 

tflg = 1 - tempdl 

tmask =u tempd2 * t 
tempdl = - tempdl 

tempd3 - tempd3 - (tmask »u -tflg) 
tempd3 = tempd3 - (tempd2 »u -tempdl) 

*(al + [xO]) -h tempd3 


L68: 

tempdl = 0 
t = t - 0 

tempdl *[nz] 4 /calculate (t!=0)«2 

*(a0 + [xO]) =ub tempdl 

al5 - tempdl - 0 

br =[eq] nosig 

xO =uh * (pba + $list__index) 

al5 * xO - 254 

br =[ge] nosig 

tempdl =uh *(sp + 20) 

aO =uw *(pba + $list) 

tempd2 = xO + 1 

*(a0 + [xOJ) =uh tempdl 

*(pba + $list_index) =uh tempd2 


nosig: 

aO - &*(pba + $sym_array) 
xO =uh *(pba + $sym_index) 

nop 

tempd2 = xO + 1 

* (pba + $sym_index) =uh tempd2 

;sym^array[sym_index++] = sym; 

*(aO +[xO]) =ub xl 

; if (sym_index>240) do_syms(); 

tempdl = tempd2 - maxsyms 

call =[gt] $do_syms 

nop 

nop 

done_more: 

a4 = *(sp + 12) 
br = *(sp + 16) 

d6 =sw *(sp + 8) 

|| d7 —sw *(sp + 0) 
dO = &*(sp ++= 28) 

; branch occurs here 
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/★ft***************************************************************** 


* 

* 

* 

* 

* 

* 

★ 

* 

* 

★ 

* 

* 

★ 

★ 

* 

* 

* 

* 

* 

* 

* 


* 

Function : $new_dbits_nwo * 

Args : m ,cl,c3,s * 

Passed in :(dl,d2,d3,d4) * 

* 

Description: * 

m - the current index of this coefficient * 

★ 

cl - the relative index of the first child * 


c3 - the relative index of the third child * 

* 

s - the subblock that we are in * 

★ 

$new_dbits_nwo will be called when the current coefficient * 
and all of its children have been deemed insignificant, * 
from the pass down flag. * 

* 

Return Values: * 

None * 

* 


*******************************************************************/ 


$new_dbits_nwo: 


a2 

=uw 1 

*(xba + $stats_flag) 







d5 

= 

d4 

« 8 







a2 

= 

a2 

+ 

d5 







aO 

= 

a2 



;* (aO 

+ 

[xO]) 

= 

stats_flag[s*256 + 

m] 

xO 

= 

dl 



;x0 = 

m 





xl 

= 

d3 



; * (a2 

+ 

[xl]) 

= 

stats_flag[s*256 + 

c3] 

x2 

= 

d3 

+ 

1 

; * (a2 

+ 

[x2]) 

= 

stats_flag[c4 = c3 

+ 1] 

a2 

= 

a2 

+ 

d2 

;*(a2) 



= 

stats^flag[s*256 + 

cl] 






;* (a2 

+ 

[1]) 

= 

stats_flag[c2 = cl 

+ 1] 


tempd2 =ub *(aO + [xO]) 

tempd2 = tempd2 & 252 
*(a0 + [xO]) =ub tempd2 

tempdl = 2 
tempd2 =ub * (a2) 

mod_flags: 

tempd2 = tempd2 | tempdl 
*(a2) =ub tempd2 

tempd2 =ub * (a2 + [1]) 
tempd2 = tempd2 | tempdl 
*<a2 + [1]) =ub tempd2 

tempd2 =ub * (a2 + [xl]) 
tempd2 = tempd2 | tempdl 
*(a2 + [xl]) =ub tempd2 

tempd2 =ub *(a2 + [x2]) 

tempd2 = tempd2 | tempdl 
*(a2 + [x2]) =ub tempd2 

a2 = &* (pba + $pruned__children) 
x2 — d4 ;s is in d4 


B-8 



NAWCWD TP 8442 


nop 

tempdl =ub * (a2 + x2) 
br = iprs 

tempdl = tempdl + 1 
* (a2 + x2) =ub tempdl 

.align 512 

/A****************************************************************** 


* ★ 

* Function : $new_dbits2 * 

* Args : m,s * 

* Passed in :dl,d2 * 

* * 

* Description: - * 

* $new_dbits2 will be called to scan the fourth tier in the * 

* zero tree. _ * 

* * 

* Return Values: * 

* None * 

* * 


*******************************************************************/ 

$new dbits2: 


dO = &* (sp — 

- 24) 


*(sp + 16) =w 

iprs 


*(sp + 8) =w 

a4 


*(sp + 4) =w 

d6 


*(sp + 0) «w 

d7 


aO =uw * (xba 

+ $stats__flag) 


al =uw *(xba 

+ $stats_val) 

/*(al) = stats_val[s*256] 

d5 = dl 


- 

d3 = d2 « 9 
al = al + d3 



d3 = d2 « 8 
aO = aO + d3 



d5 = d5 + (d2 

« 8) 


*(sp + 20) =uh d5 

/index = m + s*256 

xO - dl 


/xO = m 

/ * (aO + [xO]) 

= stats_flag[s*768 + m] 


/*(al + [xO]) 

= stats val[s*768 + m] 



tmask =uh * (xba + $ TMASK) 

stats_val =sh * (al + [xO]) 
tempd3 = |stats_val | 
t = tempd3 & tmask 

tempdl =sw * (xba + $SHFT_jDN) 
tempdl = -tempdl 
t = t »u -tempdl 
(TMASK&abs (stats val [m])) »SHFT DN 


;fetch stats__val [m] 

;take abs (stats_val [m]) 

;form TMASK&abs (stats_val [m] ) 

/fetch SHFTJDN 

/form 
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;compute symbol 

tempd2 =sw *(xba + $BITE) 
tempd2 = %tempd2 

stats^val = stats^val - 0 
tempd2 =[le] al5 

t = t - 0 
tempd2 =g[eq] al5 

tempd2 =[ne] tempd2 + 1 ;form (t!=0) and add to (1«BITE)-1 

t = t - 0 
br «[le] L69 

xl = tempd2 + t ;sym = t + (( (t!=0) && (stats_val [m] >0)) 

? ((1«BITE)-1) :0) + (t! =0) 

a4 = &* (xba + $char_to_index) 

;if (t>0) 

/ stats_val [m] = (abs (stats__val [m]) - ((t*THRESH) »(BITE-1)) + (THRESH»BITE) ) ; 

tempdl =sw *(xba + $BITE) 
tempd2 =uh *(xba + $THRESH) 

tflg = 1 - tempdl 

tmask =u tempd2 * t 
tempdl = - tempdl 

tempd3 = tempd3 - (tmask »u -tflg) 
tempd3 = tempd3 - (tempdl! »u -tempdl) 

*(al + [xO]) =h tempd3 

L69: 

tempdl = 0 
t = t - 0 

tempdl = [nz] 4 ;calculate (t!=0)«2 

*(a0 + [xO]) ==ub tempdl 

al5 = tempdl - 0 

br =[eq] nosig2 

xO =uh *(pba + $list_index) 

al5 = xO - 254 

br = [ge] nosig2 

tempdl =uh * (sp + 20) 

aO =uw *(pba + $list) 

tempd2 = xO + 1 

* (aO + [xO]) =uh tempdl 

* (pba + $list_index) =uh tempd2 


nosig2: 


aO = &*(pba + $sym_array) 
xO =uh * (pba + $sym__index) 

tempd2 = xO + 1 

* (pba + $sym_index) =uh tempd2 

; sym__array [ sym_index++] « sym; 

* (aO + [xO]) xl 

; if (sym_index>2 4 0) do_syms () ; 

tempdl = tempd2 - maxsyms 
call = [gt] $do_syms 
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nop 

nop 

a4 =sw *(sp + 8) 

br = *(sp + 16) 

d6 =sw *(sp + 4) 

| | d7 =sw *(sp + 0) 
dO = &* (sp ++= 24) 


.align 


512 


/★★★★**★★***★★****★★*★*★****★★*★★****★*★*******★*★**★*★★★★★★******** 


* Function : do_syms 

* Args : none 


Description: 

do__syms will be called to empty the symbol cache. 

Return Values: 

None 


*******************************************************************/ 


$do_syms: 

dO = &*(sp —= 20) 
*(sp + 16) =w iprs 


*(sp + 12) »w a4 
*(sp + 8) =w d6 


*(sp + 4) =w al2 
It *(sp + 0) =w d7 

*(xba + $stophere) =uw al5 

al2 = &* (xba + $char_to_index) 
a4 = &* (xba + $sym_array) 

d6 aauh * (xba + $sym_jindex) 

x8 =ub *a4++ 

nop 

more__syms: 

d7 =ub *(al2 + x8) 

call = $encode_symbol 
nop 

dl = d7 


call = $update_model 
nop 

dl = d7 

dl =sw *(xba + $STOP) 
dl a dl - 1 
br =[eq] get__out2 
nop 

d6 = d6 - 1 
br = [gt] more_syms 
x8 =ub *a4++ 
nop 
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get_out: 


a4 =w *(sp + 12) 

|| d6 -w *(sp + 8) 

al2 -w *(sp + 4) 

|| d7 =w *(sp + 0) 

br = *(sp + 16) 

*(pba + $sym__index) =uh al5 
dO = &*(sp ++= 20) 

get_out2: 

a4 =w *(sp + 12) 

II d6 -w *(sp + 8) 

al2 =w *(sp + 4) 

|| d7 =w *(sp + 0) 

dO = *(sp + 16) 

d2 = *(pba + $getaway_address) 
dl =ub *(pba + $quick_getaway) 
dl = dl - 1 
dO = [eq] d2 

br = dO 

* (pba + $sym__index) =uh al5 
dO = &*(sp ++= 20) 


/************************************************★****************** 
* * 

* Function : encode_symbol * 

* Args : none * 

* * 


Description: * 

encode_symbol will be called to perform the arithmetic * 

encoding of a symbol. This routine was lifted from a C * 
compiled program and included here, for cache coherency. * 


* * 

* Return Values: * 

* None * 

* * 


*******************************************************************/ 


$encode_symbol: 



xO 

= dl 




aO 

= &* (xba + 

$cum__freq) 


d3 

-sw 

* (xba 

+ $low) 


d2 

=sw 

* (xba 

+ $high) 


dl 

o 

X 

II 

« 1 



d4 

« d2 

- d3 


11 

d2 

=g aO 



al 

= dl 

+ d2 


11 

dO 

= &* (sp —= 

4) 


d4 

= d4 

+ 1 


11 

*(sp) =w 

iprs 



dl 

==uhl 

d4 


11 

d5 

=sh 

* (al - 

2) 


dl 

=u d5 

* dl 


11 

d2 

=uhl 

d5 



d2 

«u d2 

* d4 



B-12 



NAWCWD TP 8442 


11 


L19: 


dl =u d5 * d4 
d2 = d2 + dl 
call * I_DIV_JW 

dl = dl + (d2 « 16) 
xl =sh *aO 
d2 = xl 

dO =uhl d4 

dl =sh *(aO + [xO]) 

dO =u dO * dl 
d2 sauhl dl 
d2 =u d4 * d2 

dl =u d4 * dl 

d2 « dO + d2 

dl = dl + (d2 « 16) 

d4 = d5 + d3 
d2 = xl 

call = I_DIV_JW 

d4 a d4 - 1 

*(xba + $high) =w d4 

d2 = d5 + d3 

dl = d4 - (1 \\ 8) 

br **[ge] L25 

*(xba + $low) —w d2 

dl — [g©3 d2 - (1 \\ 8) 


call = $bit_plus_follow 

nop 

dl = 0 


L20: 


I I 


L21: 


I I 


br == L23 

dl =sw * (xba + $high) 
dl = dl « 1 


dl *sh * (xba + $bits_to_follow) 

d2 = dl + 1 

d3 =sw *(xba + $low) 

* (xba + $bits_to_follow) =h d2 

d2 = d3 - (1 \\ 7) 
dl «=sw * (xba + $high) 
br = L22 

dl = dl - (1 \\ 7) 

*(xba + $low) =w d2 
*(xba + $high) =w dl 


call = $bit_plus_follow 

nop 

dl = 1 

dl =sw *(xba + $low) 

dl as dl - (1 \\ 8) 
d2 =sw *(xba + $high) 

dl = d2 - (1 \\ 8) 

* (xba + $low) *=w dl 
*(xba + $high) =w dl 
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L22: 






dl 

« dl 

« 1 


L23: 






d4 

= dl 

+ 1 


1 I 

dl 

=sw 

*(xba + 

$low) 


d2 

= dl 

« 1 



dl 

= d4 

- (l w 

8) 


br 

-[It] 

L19 



*(xba + $high) =w d4 
*(xba + $low) =w d2 


L24 : 


L25: 


dl = d2 - (1 \\ 8) 


br =[ge] L21 
nop 

dl -tit] d2 - (1 \\ 7) 

br «[lt] L30 
nop 

dl =[ge] d4 - 384 

br =[lt] L20 
br = [ge] L30 
nop 

nop 


L30: 

br = *(sp) 
nop 

dO = &*(sp ++= 4) 

/******************************************************************* 
★ * 

* Function : I_DIV_JW * 

* Args : none * 

* * 

* Description: * 

* I DIV JW will be called to perform an Integer Divide. * 


Return Values: 
None 


*******************************************************************/ 


;****************************************************************************** 
;* I_DIV.ASM vl.10 - Integer Divide * 

;* Copyright (c) 1993-1995 Texas Instruments Incorporated * 

;******************★************************************************★********** 


; - - + 

; | i^div.asm = PP assembly program that is used to return a 32-bit I 

; 1 signed integer quotient from 32-bit signed integer 1 

; | division when called by a C program. I 

; I I 

; - - + 


.global I_DIV_JW 
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; -- + 

; | 32-bit Signed Integer Word Divide Subroutine : | 

; I o Input 32-bit signed integer Operand 1 is in dl (numerator) . f 

; I o Input 32-bit signed integer Operand 2 is in d2 (divisor) . | 

; | o Output 32-bit signed integer is in d5 (Answer = quotient) . | 

; I o Output 32-bit signed remainder is discarded. i 

; I o 0 input divisor produces 0x80000000 output with overflow set. | 

; I o Quotient = 0x80000000 sets overflow. | 

; I o Number of Stack Words used =3. I 

; I o MF register is saved. | 

; I o NOTE: Loop Counter 2 Registers are used but NOT restored ! I 


; + - + 

; +-+ 


32 bit / 32 bit ==> 32 bit signed quotient 
Signed PP Integer Division 

Numerator / Denominator = Quotient + Remainder (discarded) 
Divide by 0 produces 80000000 and sets sr(V) 

Divide Overflow is not possible if Divisor is non-zero, 

except 80000000/ffffffff « 80000000 will set sr(V). 
MF register is preserved. 


; +- 

argl: 

. set 

dl 

; input argument 1 = Numerator (32 low bits) 

arg2: 

set 

d2 

; input argument 2 = Divisor (32 bits) 

ans: 

set 

d5 

; answer = 32 bit signed quotient 

Div: 

set 

d3 

; Input Divisor 

Num: 

set 

d4 

; Input high Numerator = 0 

Tmp: 

set 

d5 

; ALU output for each DIVI 

I_DIV_JW 



; Signed Word Integer Divide: Ans = Opl / Op2 



Div —0—| arg2 j 

; negate | divisor | 

1 1 

*(sp-=[3]) = Div 

; || push Div 


br = [z] Div_By_0 

; Divide By 0 ? 


Num » 0 

; high numerator = 0 

t 1 

*<sp+[l]) - mf 

; || push mf 

1 1 

*(sp+ [2]) = Num 

; || push Num 


mf - | argl | 

; input lo | niimerator 


lrse2 = 29 

; loop count - 1 


Tmp m divi (Div, Num=Num) 

; 1-st divide iterate 


Tmp = divi (Div, Num=Tmp 

[n] Num ) ; 2-nd divide iterate 

LoopSW: 

Tmp = divi (Div, Num=Tmp 

[n] Num ) ; divide iterate 3-32 


ans = mf 

; | ans I = mf 

I i 

Div = *sp++ 

; 11 pop Div 


Num = argl A arg2 

; quotient sign 

11 

br = iprs 

; || return 


ans -[n] -ans 

; quotient is negative, 

11 

mf = *sp++ 

; || pop mf 


Num = *sp++ 

; pop Num 


Div_By_0: 

Div_Ovf1: 

br = iprs 

I| Div = *sp++ 
mf - *sp++ 
ans = 0 - 1«31 
|| Num = *sp++ 


; Divide By 0 \_ Optional Error 

; Divide Overflow / Return Code 

; return 
/ I I pop Div 
; pop mf 

; returns 0x80000000, sets sr(V) 
; I I pop Num . . . [END] 
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/★★★a*************************************************************** 


Function : update_model 
Args : none 


* Description: * 

* update_model will be called to update the arithmetic * 

* model's parameters. This routine was lifted from a C * 

* compiled program and included here, for cache coherency. * 

* * 

* Return Values: * 

* None * 

* ★ 

***************+***********★***************************************/ 


$update_model: 

aO = &*(xba + $cum_freq) 
nop 

al = dl 

I| dl =sh *a0 

dl = dl - 75 
br =[ne] L6 
nop 

a2 =g [ne.ncvz] al 

d2 —sw *(xba + $No_of_symbols) 
dl - d2 - 0 
br =[lt] L6 
nop 

a2 =g [It.ncvz] al 

dl =g aO 
lei = L5 - 8 

d5 = d2 « 1 

I I d3 = &* (xba + $freq) 

aO * d5 + dl 

d4 = <±2 « 1 

|| lrsl = d2 

a8 = d4 + d3 

d2 = 0 


L4: 


dl =sh 


dl 

d3 


dl 

dl 


dl = dl 


*a8 
+ 1 

»u 31 
+ d3 


dl = dl »s 1 


dl =shO 
*a8 =h 


dl 

dl 


*a0— =h d2 



dl 

=sh 

*a8 — 


d2 

= d2 

+ dl 

L5: 





a2 

-g al 

L6: 





d3 

= &* 

(xba - 


dl 

= a2 

« 1 


d5 

- a2 

« 1 


d4 

= d3 

+ dl 


aO 

= d5 

+ d3 


a8 

- d4 

- 2 


+ $freq) 


nop 
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d4 

=sh 

*a8 

11 

d3 

=sh 

*a0 


d3 

= d3 

- d4 


br 

- [ne] 

L9 


d2 

=g [ne.ncvz 


d3 

- [ne] 

d2 - i 

L7: 





dl 

= dl 

- 2 

1 1 

d3 

=sh 

*—a8 


a2 

= a2 

- 1 

1 1 

d4 

-sh 

*—aO 


d3 

= d4 

- d3 


br 

“ [eq] 

L7 


nop 



nop 



L8: 

d2 =g a2 
d3 = d2 - al 

L9: 

br =[ge] Lll 

nop 

nop 

xO ® &* (xba + $index_to__char) 
nop 

a8 =ub *(a2 + xO) 

x8 = &*(xba + $char_to_index) 

a9 =ub *(al + xO) 

*(a2 + xO) =b a9 
*(al + xO) -b a8 
* (a8 + x8) =b al 
*(a9 + x8) -b a2 

Lll: 

d3 =sh *a0 
d3 = d3 + 1 

d3 = a2 - 0 
I I *a0 =h d3 

br =[le] L15 
br =[le] L16 
nop 

le2 = L14 - 8 
d2 = a2 - 1 

d3 = &*(xba + $cum_freq) 
lrs2 = d2 
aO = dl + d3 
nop 

L13: 

dl =sh *—aO 
dl = dl + 1 
*a0 =h dl 

L14 : 

br = LI6 
nop 

L15: 

nop 

L16: 

br * iprs 

nop 

no P 
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/******************************************************** *********** 


Function : bit_j?lus_follow * 

Args : none * 

* 

Description: * 

bit_j?lus_follow will be called to output several bits that * 

have been encoded by the arithmetic encoder. * 

* 

Return Values: * 

None * 

* 

****************************★********★*****************************/ 


$bit_jplus_f ollow: 

dO = &*<sp -“= 8) 

*(sp + 4) =w iprs 
call = $new_output_bit 
nop 


d6 = dl 

|| *(sp + 0) =w d6 

dl =sh *(xba + $bits__to_follow) 
dl = dl - 0 
br «[le] L36 
nop 

dl = [gt] d6 - 0 

d6 - 1 || d6 =[ne] al5 

L34: 

call = $ ne w_ou tpu t_b it 
nop 

dl » d6 

dl =sh * (xba + $bits_to_f ollow) 
dl »* dl - 1 
dl =sh0 dl 

dl = dl - 0 

| | * (xba + $bits_to_f ollow) =h dl 

br =[gtj L34 
nop 
nop 

L36: 

br = *(sp + 4) 
nop 

d6 =sw *(sp + 0) 

|| dO = &*(sp ++= 8) 
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/******************************************************************* 


* * 

* Function : new_output_bit * 

* Args : dl - the bit to append * 

* * 

* Description: * 

* new_output_bit will be called to append a single bit to * 

* the bitstream array. * 

* * 

* Return Values: * 

* None * 

* * 


****★★★★***★***★★***★★★****★*★******★★****★★★★★*★***★**★*★****★***★/ 

$new_output_bit: 

d2 =sw *(xba + $STOP) 
d2 = d2 - 1 
br =[eq] iprs 

d2 =uh * (xba + $bit_index) 
d4 = d2 »u 3 

I I d3 =uw * (xba + $byte_stream) 

aO - d4 + d3 

dO =uh * (xba + $T_BYTES) 

d4 = (d2&7) 

| | d3 =ub *a0 

d3 = d3 | (dl « d4) 

*a0 =b d3 
|| d5 = d2 + 1 

*(xba + $bit_index) =h d5 

d5 = d5 - (dO « 3) 

br =g iprs 

dl = 1 || dl = [lt] al5 /calculate new STOP value 

* (xba + $STOP) =w dl 
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mpcl -s -g -c -i\pcic8 0\include mp.c 

mvplnk -x mp.obj num2.obj ppO.out pcic80.cmd -o mp.out -m mp.map 
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erase *.o 
ppcl -s -k main.c 
ppcl -s -k j_enc.c 
ppcl ~s -k modlp.c 
ppasm coded.s 
ppasm bitebits.s 
ppasm subpass.s 
ppasm hvenc.s 
ppasm mean2.s 
ppasm ztr.s 

mvplnk -x main.o hvenc.o mean2.o j_enc.o coded.o ztr.o bitebits.o subpass.o modlp.o 
pcic80a.cmd -t runppO -o ppO.out -m ppO.map 
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/******************************************************************* 

* * 

** codenew.c (PP Program) 

** 

** Written by Jim Witham, Code 472300D, 939-3599 
*★ 

** Contains code_dlist2 subroutine. 

★ * 

★★***************************************************+*************/ 

#include <mvp.h> 

#include "arith.h" 

#include "modlp.h" 

#define XSIZE 512 
#define YSIZE 256 
#define NSCALES 5 
#define AX 16 /* = XSIZE/BS */ 

#define AY 8 /* = YSIZE/BS */ 

#define BS 32 /* = 2 A NSCALES */ 

#define maxsyms 240 

extern unsigned char emubrk; 
extern unsigned char *stats_flag; 
extern int STOP; 

extern unsigned short syms_to_do; 
extern unsigned short *save_array; 
extern unsigned char read__more; 
extern unsigned short leftjoff; 
extern int FIRST__DEC; 
extern unsigned char RB__DEC; 
extern unsigned char PASS_DEC; 
extern unsigned short THRESH; 
extern unsigned short *list; 
extern unsigned short list_index; 

extern unsigned char index_to_char [Max_No_of_symbols+l] ; 

extern short *stats_val; 

extern unsigned short symbols_read; 

extern int BITE; 

extern unsigned char PASS_ENC; 

extern unsigned short TMASK; 

extern int SHFT_DN, RB__ENC ; 

extern unsigned char sym_array[256]; 

extern unsigned short syiu_index; 

extern unsigned char pruned_children[8] ; /* used to keep count of the number of pruned */ 

/* children within a zero tree */ 

extern unsigned char total_links; ' /* keeps count of pruned trees */ 

extern unsigned char link__list[6J ; /* list with unpruned trees in it */ 


/******************************************************************* 
★ ★ 

* Function : code_dlist2 * 

* Args : none * 

* * 

* Description: * 

* code_dlist2 scans six zero trees in order for significant * 

* coefficients, it also prunes off zero trees that have no * 

* significant coefficients below the current level. * 

* * 

* Return Values: * 

* None * 


*******************************************************************/ 


void code_dlist2() 

{ 

int i,p,s; 
unsigned short t; 
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register unsigned int m,c; 

/* Perform Dominant pass */ 
for (s=0;s<6;s++) pruned_children[s] = 0/ 
for (s=0;s<6;s++) comp_dbits(0,l,2,3,s); 
m = 0; 

for (s=0;s<6;s++) 

{ 

if (pruned_children[s] < 1) 

{ 

link_list [m++] = s; 
pruned_children[s] = 0; 

I 

else 

{ 

pruned_children[s] = 255; 
for (i=(s*256)+l;i<((s*256)+4);i++) 
stats_flag[i] = stats_flag[i] & 252; 

) 

} 

total_links = m; 

if <(STOP==0) && (total_links«=0)) 
for(m=l;m<4;m++) 

{ 

c = 4*m; 

for (i=0;i<totalJLinks;i++) 

{ 

s = link_JList [i]; 
if ((stats_flag[m + s*256]&2)==2) 
new_dbi ts__nwo (m, c, 2, s) ; 
else if <(stats_flag [m + s*256]&6)==0) 
new_dbits_wo(m, c,2,s); 

} 

} 

m = 0; 

for (s=0;s<6;s++) 

{ 

if (pruned__children[s] < 3 ) 

{ 

link_list[m++] = s; 
pruned_children[s] = 0; 

} 

else 

{ 

if (pruned__children [s] != 255) 

for <i=(s*256+4);i<(s*256+16);i++) 
stats_flag[i] - stats__flag[i] & 252; 
pruned__children [s] = 255; 

} 


total_links - m; 

if ((STOP==0) && (total_links!=0)) 
for(m=4;m<16;m++) 

{ 

c = 8*(m/2) + 2*(m%2); 
for (i=0;i<total_links;i++) 

{ 

s = link_list[i]; 

if ((stats_flag [m + s*256] &2)“2) 
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new_dbits_nwo (m, c, 4, s) ; 
else if ((stats_flag[m + s*256]&6)~0) 
new__dbi ts_wo (m , c, 4 , s) ; 

} ' 


m = 0; 

for (s=0;s<6;s++) 

{ 

if (pruned^children[s] < 12 ) 

{ 

link_list[m++] = s; 
pruned_children[s] = 0; 

} 

else 

{ 

if (praned_children[s] != 255) 

for (i=(s*256+16);i<<s*256+64);i++) 
stats_flag[i] = stats_flag[i] & 252; 
pruned_children[s] = 255; 

} 


total_links = m; 

if ((STOP==0) && (total_links!=0)) 
f or(m=l6;m< 6 4;m++) 

{ 

c = 16*(m/4) + 2* (m%4) ; 
for (i=0;i<total_links;i++) 

{ 

s = link_list[i]; 

if ((stats_flag[m + s*256]&2)==2) 
new_dbits_nwo(m,c,8,s); 
else if ((stats_flag[m + s*2563&6)==0) 
new__dbits_wo (m, c, 8 , s) ; 

} 


for (s=0;s<6;s++) 

{ 

if (pruned_children[s] < 48 ) 

< 

link^list [m++] = s; 
pruned_children[s] = 0; 

} 

else 

{ 

if (proned^children[s] != 255) 

for (i=(s*256+64);i<(s*256+256);i++) 
stats_flag[i] = statsjflag [i] & 252; 
pruned_children[s] ~ 255; 

} 


total_links = m; 

if ((STOP=0) && (total_links!=0)) 
for (m=64 ;m<256 ;m++) 

{ 

for (i=0;i<total__links ;i++) 

{ 

s = link_list[i]; 

if ((stats_flag[m+s*2563&2)==2) 

stats__flag [m+s*256] = stats_flag[m+s*256] & 252; 
else if ((stats_flag[m+s*256] &6) =0) 
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new_dbits2 (m, s) ; 

} 

} 

if (sym_index>0) 
do_syms(); 

sym_index - 0; 

start_model(2); 

if (STOP=0) 

PASS_ENC += BITE; 

t = THRESH»BITE; 

THRESH = THRESH»BITE; 

if (PASS_ENC = BITE) 

{ 

BITE = RB_ENC; 

TMASK = THRESH; 
for(m=l;m<BITE;m++) 

TMASK - TMASK | (THRESH»m) ; 

} 

else 

{ 

BITE = 1; 

TMASK = THRESH; 

} 

SHFT_DN -= BITE; 

/* Subordinate Pass */ 

subpass(t); 

if (sym__index>0) 
do_syms () ; 

sym_index = 0; 

start_model (1« (BITE+1)) ; 
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/a**************************************************************************** 
* ★ 

** hvenc.s (PP Program) 

★ * 

** Written by Jim Witham, Code 472300D, 939-3599 
* *• 

** File that contains the following assembly language subroutines: 

★ * 

** $subdec_yert 
* * $ subdec_horiz 

** 

★****************************************************************************/ 


.global 

.global 

.align 2048 


$ subdec__ver t 
$subdec horiz 


/******************************************************************* 


Function : $subdec_vert 
Args : outerloop,innerloop 

Passed in : (dl ,d2) 

Description: 

outerloop - the width of the coefficient patch 

inner loop - the height of the coefficient patch 

$subdec__vert will be called to perform the wavelet 
decomposition in the vertical direction. 

Return Values: 

None 


a******************************************************************/ 


$ subde c__ve r t: 

lctl = 0x0 /reset looping capability 

IrO = dl - 1 /outerloop 

lrl = d2 - 3 /innerloop 

a4 = dl 

d7 = 0 / k = 0 

lei = InnerLoopEnd / 

lsl = InnerLoop / 

leO « OuterLoopEnd 

IsO = OuterLoop 

nop 

lctl = 0xa9 /associate leO with IcO and lei with lcl 


d3 = a4 
d4 = d3 + d3 
xl = d4 
d4 = d4 + d3 
x2 = d4 

OuterLoop: 

; img[index] [k] = (img [index] [k] »1) - ( (img [0] [k] +img[2*index] [k]) »2) ; 

nop 

xO = d7 
nop 

aO =h &*(dba + [xO]) 
nop 

xO = a4 
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d2 =sh * (aO + [xl]) 
d3 =sh *(aO) 

d4 =sh *(aO + [xO]) 

|| d2 - d2 + d3 

d4 - (d4 »s 1) 
d4 = d4 - (d2 »s 2) 

* (aO + [xO]) =h d4 

; img[0][k] += img [index] [kj ; 

d3 = d3 + d4 
*(aO++=[xO]) =h d3 
nop 

InnerLoop: 

; img[2*l+index] [k] - (img[2*l+index] [k]»l) - 

((img[2*l] [k] +img[2*1+2*index] [k])»2> ; 

d3 =sh * (aO + [xO]) 
d4 =sh *(aO + [x2]) 

d2 =sh *(aO + [xl]) 

II d5 = d3 + d4 

d2 = (d2 »s 1) 
d2 = d2 - (d5 »s 2) 

* (aO + [xl]) =h d2 

; img[2*l] [k] +- <(img[2*l+index] [k]+img [2*l-index] [k]+2)»l) ; 

d4 =sh *(a0++=[xO]) 

d5 = d2 + d4 

d3 = d3 + (d5 »s 1) 

*(aO++=[xO]) =h d3 

InnerLoopEnd: 
nop 

; img [YSIZE-index] [k] = (img [YSIZE-index] [k] -img[YSIZE-2*index] [k]) »1 ; 

d2 =sh * (aO + [xl]) 
d3 -sh *(aO + [xO]) 
d2 = d2 - d3 
d2 = (d2 »s 1) 

* (aO + [xl]) =h d2 

; img[YSIZE-2*index] [k] += ((img [YSIZE-index] [k]+img[YSIZE-3*index] [k]+2) »1) 

d4 =sh *(aO) 
d5 = d2 + d4 
d3 = d3 + (d5 »s 1) 

* (aO + [xO]) «h d3 

OuterLoopEnd: 

d7 = d7 + 1 

br = iprs 

nop 

nop 
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* 


* 


* Function : $subdec_horiz 

* Args : outerloop,innerloop 

* Passed in : (dl ,d2) 

* 

* Description: 

* outerloop - the width of the coefficient patch 

* 

* innerloop - the height of the coefficient patch 

★ 

* $subdec_horiz will be called to perform the wavelet 

* decomposition in the horizontal direction. 

* 

* Return Values: 

* None 

* 

******************************************************************* 


★ 

* 

* 

* 

* 

* 

■k 

* 

* 

* 

* 

* 

★ 

* 

* 

/ 


$subdec_horiz : 

Ictl = 0x0 ;reset looping capability 

IrO = dl - 1 
lrl = d2 - 3 

a4 = d2 ; innerloop 

d7 = 0 ; k — 0 

lei = InnerLoopEnd2 
lsl = InnerLoop2 

leO = OuterLoopEnd2 
IsO = OuterLoop2 


nop 

Ictl = 0xa9 /associate leO with IcO and lel with lcl 


nop 

nop 

OuterLoop2: 

; img[k] [index] -= ( (img[k] [0]+img[k] [2*index]) »1) ; 

; aO al a2 


d3 = a4 
d3 = d3 * d7 
d3 = d3 « 1 
xO = d3 
nop 

aO =h &*(dba + [xO]) 
nop 

d3 =sh * (aO) 
d2 =sh *(aO + [2]) 

d4 =sh *(aO + [1]) 

|| d2 = d2 + d3 

d4 = d4 - (d2 »s 1) 

* (aO + [1]) =h d4 

; img[k] [0] = (img[k] [0]«1) + img[k] [index] ; 

d4 = d4 + (d3 « 1) 

*(a0++=[1]) *h d4 

nop 
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InnerLoop2: 

; img[k] [2*l+index] -= ((img [k] [2*1]+img [k] [2*l+2*index]) »1) ; 

d3 =sh *(aO + [1]) 
d4 =sh * (aO + [3]) 

d2 =sh *(aO + [2]) 

II d5 * d3 + d4 

d2 = d2 - (d5 »s 1) 

*(aO + [2]) =h d2 

; img[k] [2*1] = (img[k] [2*1]«1) + ((img[k] E2*l+index]+img[k] [2*l-index]) »1) ; 

d4 =sh *(aO++=[l]) 
d5 = d2 + d4 
d3 = d3 « 1 
d3 = d3 + (d5 »s 1) 

*(aO++=[1]) =h d3 

InnerLoopEnd2: 
nop 

; img[k][XSIZE-index] -= img[k][XSIZE-2*index]; 

d2 =sh MaO + [2]) 
d3 =sh *(aO + [1]) 
d2 = d2 - d3 
* (aO + [2]) =h d2 

; img[k][XSIZE-2*index] « (img[k] [XSIZE-2*index]«1) + ( (img[k] [XSIZE- 

index]+img [k] [XSIZE-3*index]) »1) ; 

d4 =sh *(a0) 
d5 = d2 + d4 
d3 = d3 « 1 
d3 * d3 + (d5 »s 1) 

*(aO + [1]) =h d3 

OuterLoopEnd2: 

d7 = d7 + 1 

br = iprs 

nop 

nop 
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/★a*************************************************************************** 

* * 

** j_enc.c (PP Program) 

** 

** Written by Jim Witham, Code 472300D, 939-3599 

* ★ 

** PP Program that orchestrates the generation of the bitstream. Processes 
** 6 Zero Trees per partition. 

★ ★ 

★a***************************************************************************/ 

#include <mvp.h> 

#include "arith.h" 

#include "modlp.h" 

#define ASSEM 
#define ASSEM3 
#define ASSEM2 

#define XSIZE 512 
#define YSIZE 256 
#define NSCALES 5 
#define AX 16 /* « XSIZE/BS */ 

#define AY 8 /* = YSIZE/BS */ 

#define BS 32 /* - 2 A NSCALES */ 

#define maxsyms 240 

unsigned char sym_ar ray [256] ; 
unsigned short sym_index; 

int stophere; 

unsigned char PASSJBNC; 
unsigned short TMASK; 

int SHFT_DN,RB_ENC; 

unsigned char buffer; /* Bits buffered for output */ - 

unsigned char bits_to_go; /* # bits free in buffer */ 

unsigned char pruned_children [8] ; /* used to keep count of the number of pruned */ 

/* children within a zero tree */ 

unsigned char total_links; /* keeps count of pruned trees */ 

unsigned char link_list [6] ; /* list with unpruned trees in it */ 

unsigned short list__index; 

unsigned char *pp_stop_encode = (unsigned char *) 0x010007D4; 

/* CURRENT STATE OF ENCODING */ 

int low, high; /* Ends of current code region */ 

short bits_to_follow; /* Number of opposite bits to output 

after the next bits */ 

extern shared int local_maxval[4]; 
extern shared int global_maxval; 

extern short *img; 
extern short *coeff_block; 
extern short *stats_val; 
extern short *ztl; 
extern unsigned char *stats__flag; 
extern unsigned int *stomp_flag; 
extern unsigned int *stomp_ztl; 
extern unsigned char *byte_stream; 
extern unsigned char *tbuf; 
extern unsigned short *list; 
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extern int No_of_chars; 

extern int EOF_symbol; /* Index of EOF symbol */ 

extern int No_of_symbols; /* total # of symbols */ 

/* TRANSLATION TABLES BETWEEN CHARS AND SYMBOL INDEXES */ 
extern unsigned char char_to_index [Max_No_of_chars ] ; 
extern unsigned char index_to_char [Max_No__of_symbols+l] ; 

extern short int cum_freq[Max_No_of_symbols+l] ; 

extern int buffer__index; 

extern unsigned short bit_index; 

extern unsigned short T_BYTES; 

extern short maxval; 

extern unsigned short * ALLOC; 

extern int FIRST__ENC, FCNT_ENC; /* 1st pass bite size */ 
extern unsigned char *passes_d; 

/* Required on-line storage for high speed */ 

extern int STOP; 

extern int BYTE_CNT; 

extern unsigned short THRESH; 
extern int BITE; 

extern int SIG_COEF; 

extern int whoami(); /* Function will return the PP number */ 

void start_model<int nchars); 
void update_model(int symbol); 
void encode_symbol(int symbol); 
void bitjplus_follow(int bit) ; 
void output_bit(int bit); 

void new_dbits (int m, int c, int number, int s) ; 
void new_dbits2 (int m, int s) ; 
void comp__ztr (int s) ; 
void subpass(int t) ; 

/******************************************************************* 


* * 

* Function : start_encoding * 

* Args : none * 

* * 

* Description: * 

* start_encoding will be called to initialize the * 

* arithemetic coder * 

* * 

* Return Values: * 

* None * 

* * 


*★*★*★★*********★★****************************★********************/ 

void start_encoding() 

{ 

low =0; /* Full code range */ 

high = Top^yalue; 

bits_to_follow — 0; /* No bits to follow */ 

) 
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/******************************************************************* 
★ * 

* Function : start_outputing_bits * 

* Args : none * 


★ 


★ 


* Description: * 

* start_outputing_bits will be called to initialize the * 

* the bit buffer * 

* * 

* Return Values: * 

* None * 

* * 

*******************************************************************/ 


void start_outputing_bits() 

{ 

buffer =0; /* Buffer is empty at start */ 

bits_to__go = 8; 

} 


f ★* 
★ 

* 

★ 

★ 

* 

* 


***************************************************************** 

* 

input__block * 

s - subblock number (0-5) * 

p - pair number (0,1) * 


Function 

Args 


Description: 

input_block will be called to copy the coefficients from 
one section of memory in in-place format to another 
place in memory in zero-tree format 

Return Values: 

None 


★★★★★a*************************************************************/ 


void input_block(s,p) 
int s,p; 

{ 

int i, j , k, 1; 
i = s * 256; 

for (k—NSCALES-2 ;k>0 ;k—) 

{ 

if (k== (NSCALES-2) ) 

for (j=0; j<(1« (NSCALES-1) ) ; j+=(l«k)) 
for (1=0;1<<1<<NSCALES) ;l+=(2«k)) 

stats__val [i++] = coeff_block[p*512 + j*32+l]; 

for (j=0;j<(l« (NSCALES-1) ) ; j+=(l«k) ) 
for (l=(l«k) ;1< <1<<NSCALES) ;l+=(2«k)) 

stats_yal[i++] — coeff_block[p*512 + j*32+l]; 

for (j= (1« (k-1) ) ; j< (1« (NSCALES-1) ) ; j+= (l«k) ) 
for(1=0;1<(1«NSCALES) ;l+=(2«k)) 

stats_val[i++] — coeff_block[p*512 + j*32+l]; 

for ( j= (1«(k-1) ) ;j<(l«(NSCALES-1) ) ; j+=(l«k) ) 
for (l=(l«k) ;1< <1<<NSCALES) ;l+=(2«k)) 

stats val[i++] * coeff_Jblock [p*512 + j*32+l] ; 


) 


} 
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/a**********************************************************★***★★★* 


* 

★ 

* 

* 

* 

it 

* 

★ 

* 

★ 

* 

* 

* 

* 

* 

* 

it 

* 

* 

★ 


★ 

Function : comp_dbits * 

Args : m - index of coefficient * 

cl- first child index * 

c2- second child index * 

c3- third child index * 

s - subblock number (0-5) * 

* 

Description: * 

comp__dbits will be called to determine if the root of the * 
zero tree pass the threshold or not, if it does then it * 
is tagged as significant, its value is modified. Then its* 
children are checked to see if any coefficients in its * 

line are above the threshold, if they aren't then a * 

passdown flag is given to the children - to signify no * 
significance under them. * 

* 

Return Values: * 

None * 

★ 


★★★a***************************************************************/ 


void comp__dbits (m, cl, c2, c3, s) 
int m,cl,c2,c3,s; 

{ 

int sym,tflg,t,m0; 


mO = m; 

m = m + (s * 256) ; 

if ((stats_flag[m]&6)==0) 

i 

tflg = ( (TMASK&ztl [mO + (s * 64)]) = 0) ; 
t = (TMASK&abs (statsjval [m] ) ) »SHFT_DN; 

sym = t+ (((t!-0)&&(stats_val [m]>0)) ? ((1«BITE)-1) : 0) + ( (-tflg) &1) ; 

if ((t>0)&&(list_index<254)) 
list[list index++] = m; 


sym^array[sym_index++] = sym; 

if (sym__index > maxsyms) 

{ 

do_syms () ; 
sym_index =0; 

stats_val[m] = (t>0) ? (abs (statsjval [m]) - ( ((t*THRESH) » (BITE-1) ) + (THRESH»BITE) ) ) : 

stats_val[m] ; 


if (tflg!-0) 

{ 

pruned_children[s] = 1; 

stats_flag[(s*256)+cl] = stats_flag[(s*256)+cl] 
stats_flag[(s*256)+c2] = stats_flag[(s*256)+c2] 
stats__flag[ (s*256) +c3] = stats_flag[(s*256)+c3] 
} 


(tflg«l) ; 
(tflg«l) ; 
(tflg«l) ; 


stats^f lag[m] = (t!=0)«2; 

} 

else if ((stats_flag[m]&2)!=0) 

{ 

stats_flag[m] = stats_flag[m] & 252; 
stats_flag[ (s*256) +cl] = stats_flag[(s*256)+cl] | 2; 
stats_flag[(s*256)+c2] = stats_flag[ (s*256)+c2] | 2; 

stats_flag[ (s*256)+c3] = stats_flag[(s*256)+c3] | 2; 
pruned_children [s] =1; 

} 
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> 

/******************************************************************* 
* * 

* Function : p_code * 

* Args : none * 

* * 

* Description: * 

* p_code will be called to coordinate the generation of the * 

* bitstream from 6 sets of subblocks * 


* * 

* Return Values: * 

* None * 

* * 


*******************************************************************/ 

void p_code() 

{ 

int k,l,i,j, mask,bite,pmin; 
sym_index = 0; 

/* wait for coefficients to come in to determine maxval */ 
maxval = 0; 

while ((INTFLG& (1«20)) =0) ; 

INTFLG = 1«20; 

/* Calculate local maximum value of coefficients */ 

for (k=0;k<128;k++) 

if (abs (stats__val [k]) > maxval) 
maxval = abs (stats__val [k]) ; 

local_maxval [whoami () ] - maxval; 

/* tell MP done with maxval calculation */ 

asm{" x2 = 0x00002100"); 
asm(" cmnd = x2"); 

/* wait for global maxval computation to complete */ 

while ((INTFLG& (1«20)) «=0) ; 

INTFLG = 1«20; 

maxval = l<<global__maxval; 

/* Determine number of bitplanes to process on first and */ 

/* successive passes */ 

FIRST_ENC = tbuf E 0]; 

if (FIRST_ENC = 6) 

{ 

bite = 3; 

RB_ENC = 3; 

} 

else if (FIRST_ENC ~ 5) 

{ 

bite = 3; 

RB_ENC = 2; 

} 

else if (FIRST_ENC==4) 

{ 

bite - 2 ; 

RB_ENC = 2; 

} 
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else 

{ 

bite = FIRST_ENC; 

RB_ENC = 1; 

} 

/* Compute mask */ 

mask - maxval; 
for(k=l;k<bite;k++) 

mask = mask | (maxval»k) ; 

pmin « 100; 

/* tell MP this PP is ready to encode the bit stream */ 

asm (" x2 = 0x00002100"); 

asm(" cmnd *= x2") ; 

/* Main Loop: Continue until stop condition is reached */ 

while (*pp_stop_encode == 0) 

{ 

BITE = bite; 

PASS_ENC = 0; 

STOP = 0; 

BYTE_CNT = 0; 

THRESH = maxval; 

SHFT_DN « global jmaxval - BITE + 1; 

TMASK « mask; 
buffer__index = 0; 

list_index = 0; 

startjnodel (1«(BITE+1)) ; 
start_outputing_bits(); 
start_encoding(); 

/* wait for new coefficients */ 
while ( (INTFLG& (1«20)) =0) ; 

INTFLG = 1«20; 

if (*pp_stop_encode = 0) 

< 

for (i*=o ;i<6;i=i+2) 

{ 

input_block(i, 0) ; 
comp_ztr(i); 

inputjDlock (i + 1,1); 
comp_ztr(i + 1); 

if (i!=4) 

{ 

i « i + 1; 
i - i - 1; 

/* tell MP done inputting this block of coefficients */ 
asm (" x2 = 0x00002100"); 

asm(" cmnd = x2"); 

/* wait for new coefficients */ 
while ( (INTFLG&(1«20) )==0) ; 

INTFLG = 1«20; 

} 


} 
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T_BYTES = ALLOC [0] ; 

for (bit_index=0 ;bit_index<T_BYTES;bit__index++) byte_stream [bit_index] 
bit_index = 0; 

for(j=0;j<384;j++) 
s tomp_f lag [ j ] = 0 ; 

while (STOP == 0) 

{ 

code_dlist2 () ; 

} 

if (ALLOC[0] <3) 
passes_d[0] = 0; 
else if (ALLOCE0]==3) 

passes_d[0] = PASS_ENC-4; 
else if (ALLOC CO]<7) 

passes_d[0] = PASS_ENC-2; 
else 
{ 

passes_d[0] = PASSJENC-l; 
if (PASS_ENC-1 < pmin) 
pmin « PASS_ENC-1; 

) 

/* tell MP done with this block of pixels */ 

asm(" x2 = 0x00002100"); 

asm(" cmnd = x2 M ) ; 

> /*end if */ 

} /* end while */ 

/* Inform MP of number of passes at this BITE size */ 


if (pmin==0) 

FIRST_ENC -= 1; 

else if ((pmin > FIRSTJENC) && (FCNT_ENC > 2) &&(pmin<7)) 

{ 

FCNT__ENC = 0; 

FIRST__ENC = pmin-1; 

} 

else if (pmin > FIRST_ENC) 

FCNT_ENC++; 

else if (pmin<=FIRST_ENC) 

{ 

FIRST_ENC = pmin; 

FCNT_ENC = 0; 

} 

tbuf[1] = FIRST__ENC; 


} 
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/***************************************************************************** 

** 

** main.c (PP Program) 

* * 

** Written by Jim Witham, Code 472300D, 939-3599 
** 

** Main PP Program that calls all the other routines to perform the wavelet 
** decomposition and bitstream generation. 

** 

**★******★*********************★★★*******★***********★***********************/ 

#include "modlp.h" 

#define NSCALES 4 

unsigned short T_BYTES; 
unsigned char *pic; 
short *img; 

short *stats__val; 
short *stats_apx; 
unsigned char *stats_flag; 
unsigned int *stomp_flag; 
unsigned int *stomp_ztl; 
unsigned char *byte_stream; 
short *coeffJolock; 
unsigned short *ztl; 
unsigned char *tbuf; 
unsigned short *ALLOC; 

unsigned short *list; 

int No_of__chars; 

int EOF_symbol; /* Index of EOF symbol */ 
int No__of_symbols; /* total # of symbols */ 

/* TRANSLATION TABLES BETWEEN CHARS AND SYMBOL INDEXES */ 

unsigned char char_to_index [Max__No_of_chars] ; /* JCW old version was int */ 
unsigned char index_to_char [Max_No_of_symbols+l] ; /* JCW old version was int */ 

short int cum_freq[Max_No_of_symbols+l] ; /* JCW old version was int */ 

int buffer_index; 

extern void subdec_vert(int outerloop,int innerloop); 
extern void subdec__horiz (int outerloop,int innerloop); 

extern int calc_n_sub__mean (int oldmean) ; 

extern int whoami(); 

/* ************************* MAIN PROGRAM ************************** */ 

cregister extern volatile unsigned int INTFLG; 

short column_outerloop[5] = { 8, 16, 32, 16, 8 }; 

short row_outerloop[5] = { 4, 8, 16, 8, 4 }; 

short column_innerloop [5] = { 120, 60, 30, 15, 8 }; 

short row_innerloop [5] = { 256, 128, 64, 32, 16 }; 

unsigned char vert_loop__index [6] * { 16, 4, 1, 1, 1 }; 
unsigned char horiz_loop_index[6] = { 15, 5, 1, 1, 1 }; 

short maxval; 
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extern shared int mean_j?p[4 3 ; 
extern shared int global_mean; 

unsigned char *passes_d; 

extern unsigned char PASS_ENC,PASS_DEC; 

int STOP; 
int BYTE_CNT ; 
unsigned short THRESH; 
int BITE; 
int SIG_COEF; 

int FIRST_ENC,FIRST_DEC; 
int FCNT_ENC,FCNT_DEC; 

unsigned int getaway_address; 
unsigned char quick^getaway - 0; 

main() 

{ 


int k, 1; 
int mean; 

/* initialize pointers */ 


asm (" 

dl = &*(dba)"); 

asm (" 

*(xba+$pic) = dl"); 

asm ( M 

*(xba+$img) = dl"); 

asm (" 

*(xba+$stats_val) = dl"); 

asm (" 

*(xba+$stats apx) = dl"); 

asm (" 

dl = &* (dba + 0x8000)"); 

asm (" 

* (xba+$stats_flag) = dl"); 

asm(" 

* (xba+$stomp__flag) = dl"); 

asm (" 

* (xba+$coeff__block) = dl") ; 

asm (" 

xO = 0x8602"); 

asm (" 

nop"); 

asm ( M 

dl = &* (dba + xO) ") ; 

asm (" 

*{xba+$list) * dl"); 

asm (" 

dl « &*(dba + OxcOO)"); 

asm (" 

*(xba+$ztl) = dl"); 

asm (" 

*(xba+$stomp ztl) = dl"); 

asm (" 

dl - &*(pba + 0x630)"); 

asm (" 

*(xba+$byte stream) = dl"); 

asm ( n 

dl = &*(pba + 0x620)"); 

asm (" 

*(xba+$passes_d) = dl"); 

asm (" 

dl = &*(pba + 0x5fc)"); 

asm (" 

*(xba+$tbuf) = dl"); 

asm (" 

dl * &*(pba + 0x600)") ; 

asm (" 

*(xba+$ALLOC) = dl"); 

FIRST_ENC = 4; 

FIRST_DEC = 4; 

FCNT_ENC 

= 0; 

FCNT DEC 

= 0; 


/* Clear the message interrupt flag that comes from the MP, just in case */ 

INTFLG = 1«20; 
quick__getaway = 0; 

while (1) 

< 

mean = 0; 
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/* Decompose image */ 

for(k=0;k<(NSCALES+1);k++) 

< 

for(1=0;l<horiz_loop_index[k];1++) 

{ 

/* wait for new pixels */ 
while ((INTFLG& (1«20)) ==0) ; 

INTFLG = 1«20; 

/* if we are on the first pass that means we have 8 bit 
pixels that have come in that need to be zero meaned 
and converted to 16 bit signed numbers */ 

if (k==0) 

mean = mean + calc_n_sub_mean (global^mean) ; 

subdec_horiz (row_ou ter loop [k] , row_innerloop [k]) ; 

/* tell MP done with this block of pixels */ 

asm (” d7 = 0x00002100"); 

asm(” cmnd = d7"); 

} 

/* Put out local mean for MP to do global mean calculation */ 
mean_pp[whoami()] = mean; 
if (k!=NSCALES) 

for(1=0;l<vert_loop_index[k];1++) 

{ 

/* wait for new pixels */ 
while ((INTFLG& <1«20)) ==0) ; 

INTFLG = 1«20; 

subdec_vert (column_outerloop [k] , column_innerloop [k]) ; 

/* tell MP done with this block of pixels */ 

asm(" d7 = 0x00002100"); 
asm (" cmnd = d7 ") ; 

} 

) 

/* Generate the bitstream */ 

p_code () ; 

} 

} 
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/***************************************************************************** 

** 

** mean2.s (PP Program) 

** 

** Written by Jim Witham, Code 472300D, 939-3599 
* *■ 

** File that contains the following assembly language subroutines: 

* ★ 

* * $ cal c_n_sub_mean 

** $whoami 
** 

********************★******************************★****************★★*******/ 

. gl obal $ cal c_n_sub__rae an 
.global $whoami 

/******************************************************************* 


* * 

* Function : $calc_n_sub_mean * 

* Args : oldmean * 

* Passed in : dl * 

* * 

* Description: ^ * 

* oldmean - the mean on the previous image * 

* * 


* $calc_n_sub_mean will be called to both calculate the sum * 

* of the pixels in this patch, and to subtract off the * 

* previous images mean (performing an 8 bit minus and 16 * 

* bit subtraction, yielding a 16 bit result, which is then * 

* further processed by shifting it to the left 2 places. * 

* * 

* Return Values: * 

* d5 - returns the sum of this patch of pixels. * 

* * 

******************************★★*★*************★***★★**★******★**★*/ 

$ calc_n_sub_mean: 
d5 = 0 

a8 = &*(dba) + 4094 
aO = 6*(dba) + 2047 

dl a o - (dl « 2) ; make mean so it can be subtracted 

le2 = loopend 
lrs2 = 510 

d2 =ub *a0— 
d5 - d5 + d2 

xl - dl + (d2 « 2) 

| | d2 =ub *a0 — 

*a8— =h xl 
M d5 = d5 + d2 

xl — dl + (d2 « 2) 

|| d2 =ub *a0— 

*a8— =h xl 
|| d5 = d5 + d2 

xl = dl + (d2 « 2) 

I| d2 =ub *a0— 

*a8— =h xl 
|| d5 — d5 + d2 

xl = dl + (d2 « 2) 

|| d2 =ub *a0— 
loopend: 

*a8— -h xl 
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I d5 = d5 + d2 

xl = dl + (d2 « 2) 
j d2 =ub *aO— 

*a8— =h xl 
| d5 = d5 + d2 

xl * dl + (d2 « 2) 
| d2 =ub *aO— 

*a8— =h xl 
I d5 = d5 + d2 

xl = dl + <d2 « 2) 
I d2 =ub *aO-- 
*a8— =h xl 
| d5 = d5 + d2 

xl = dl + (d2 « 2) 
*a8— =h xl 

br = iprs 

nop 

nop 


/★A***************************************************************** 


* * 

* Function : $whoami * 

* Args : none * 

* Passed in : * 

* * 

* Description: * 

* $whoami will be called to determine which PP this is. * 

* * 

* Return Values: * 

* d5 - returns the PP number assigned to this PP. * 

* * 


★ ★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★if************/ 


$whoami: 

d5 - comm & 0x03 

br = iprs 

nop 

nop 
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/*************★***************************************************** 

** 

** modlp.c (PP Program) 

** 

** Written by Jim Witham, Code 472300D, 939-3599 
★ * 

** Contains startjmodel subroutine. This subroutine initializes 
** the translation tables and the frequency counters for the 
** arithmetic coder. 

* * 

*******************************************************************/ 

#include "modlp. h" 

#include "arith.h" 


unsigned short bit_index; 

short int freq[Max_No_of_symbols+l]; /* Symbol frequencies */ 

extern int No_of_chars; 

extern int EOF_symbol; /* Index of EOF symbol */ 
extern int No_of_symbols; /* total # of symbols */ 

/* TRANSLATION TABLES BETWEEN CHARS AND SYMBOL INDEXES */ 


extern unsigned char char_to_index [Max_No__of_chars ] ; 
extern unsigned char index_to_char [Max__No_of_symbols+l] ; 

extern short int cum_freq[Max__No_of_symbols+l] ; 

extern int low,high; 

extern short bits_to_follow; 

extern unsigned char bits_to_go; 

extern unsigned char buffer; 

extern int BYTE_CNT; 

extern unsigned short T_BYTES; 

extern unsigned char *byte_stream; 

extern int buffer_index; 

extern int STOP; 


/**********★**★*******★******************★******★*************★***★* 
* * 

* Function : start_model * 

* Args : int nchars * 


Description: 

nchars - the number of symbols to be encoded by the model 

start_model will be called when the number of characters 
to be encoded by the model changes. It re-initializes 
the tables used by the encoder. 

Return Values: 

None 


*******************************************************************/ 


void start_model(nchars) 
int nchars; 

{ 

int i; 

/* Initialize number of chars */ 

No__of_chars = nchars; 
No_of_symbols = nchars; 


/* Setup translation tables */ 
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for (i=0;i<No_of_chars;i++) 

{ 

char_to_index[i] = i+1; 
index_to_char[i+1] = i; 

} 

/* Initialize frequency counts */ 

for(i=0;i<=No_of_symbols;i++) 

{ 

freq[i] = 1; 

cum_f req C i ] = No_of__symbols-i; 

} 

freq[03 = 0; 

} 
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/* DECLARATIONS FOR ARITHMETIC CODING AND DECODING */ 

/* Size of arithmetic code values */ 

#define Code_value_bits 9 /* Number of bits in a code value */ 

typedef short code_jvalue; /* Type of arithmetic code value */ 

#define Top_value ( ((long) l«Code_value_bits)-1) /* Largest code val */ 

/* Half and Quarter points in code value range */ 

tdefine First__qtr (Top_value/4+l) /* Points after first quarter */ 

#define Half (2*First_qtr) /* Points after first half */ 

#define Third__qtr (3*First_qtr) /* Points after third quarter */ 
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/***************************************************************************** 

*★ 

** mp.c (MP Program) 

** 

** Written by Jim Witham, Code 472300D, 939-3599 
* ★ 

** MP Program that orchestrates data movement and kicks off PP's to 
** implement the balanced wavelet algorithm written by Chuck Creusere. 

** This version is meant to be run from the PC host. 

** 

★★★★a************************************************************************/ 


#include <stdlib.h> 
#include <stdio.h> 
#include <mvp.h> 


#include <mvp__hw.h> 
#include <mp_j3treq. h> 


#include "icl.h" 
#include "vil24,h" 
#include " vol.h" 
#include "bgl.h" 
#include n pcic80.h" 
#include "cil.h" 


/* define compression ratio. Ratio is actually #:1 compression */ 

#define compression 40 

#define HOST 

#define CAMERA 

#define ENCODE 
#define ENCODE_STREAM 

#define DISPLAY 
#define SHOWDISPLAY 

#define headersize 0 

#define XSIZE 512 
#define YSIZE 240 


/♦Number of Subblocks in the X direction */ 
tdefine AX 16 

/♦Number of Subblocks in the Y direction */ 

#define AY 15 

/* 

** Define a bit mask for host request bit 
*/ 

#define SigBit(x) (((UINT32)1)«(x)) 

#define HostRequestBitMask SigBit(8) 

#define FrameDoneBit 8 

/**************************************************************************/ 
/* A pointer to the array of pixels for the number characters */ 
extern int *number_pixels; 

/* Place certain variables in MP Parameter RAM for the sake of speed */ 

#pragma DAT A_S E C TI ON (Semaphore, "mp__vars ") 
long Semaphore; 

#pragma DATA__SECTION (encode^time, "mp_vars ”) 

#pragma DATA_SECTION (decode_time, "mp_vars ") 
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int encode_time,decode_time,proc_time; 

#pragma DATA_SE C T ION (s tart_time, M mp_vars ") 
unsigned int start_time; 

#pragma DATA_SECTION (local_alloc, "mp_vars ") 
unsigned short local_alloc[AX*AY/6]; 

#pragma DATA^SECTION (comp_table, n mp_yar s ") 
unsigned char computable [5] = {10, 20, 40, 80, 100}; 

#pragma DATA_SECTION {las t_comp, "mp_vars ") 
unsigned char last_comp = 2; 

#pragma DATA_SECTION (compression^ratio, M mp_vars ") 

UINT32 *compression_ratio; 

/* Place shared variables in PP Parameter RAM where both the PPs and MP can get to them 
*/ 

#pragma DATA__SECTION (mean_j>p, "sh_vars M ) 

#pragma DATA_SECTION(global_mean, "shjvars") 

shared int mean_pp[4]; 
shared int global_mean; 

#pragma DATA_SECTXON (local_maxval, " sh_var s ") 

#pragma DATA_SE C TI ON (gl obal_maxval, " sh_vars ") 

shared int local_maxval[4]; 
shared int global_maxval; 

static void SignalHandler(UINT32 Signals); 
static void init_alloc_table{UINT32 index); 
static void init_alloc_table2(UINT32 index); 


/★*************************************************************************/ 

void task (void *arg) 

< 

unsigned char times=0; 
unsigned char flip =0; 
unsigned int stream_addr [2] ; 

PCIC80STAT ReturnVal; 


PVTL24 pVil24; 

ICL_IMG *vim,*f0,*f1; 
unsigned int dx,dy; 
int i,j; 
int lp; 

unsigned char vert_loop_index [ 6 ] , horiz_loop_index [ 6 ] ; 
int jump_col[6]; 
long jump_row[6] ; 

long *unused_pixels; 

long *ptr; /* temp pointer */ 

PTREQ *p[10]; /* temp pointer to packet transfer structure */ 

int temp_maxval; 

/* JCWtest */ 

char stringl[32]; 
int tempostr; 
float temp_time; 
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int junki; 

unsigned short junkfill = 0; 

unsigned short * fillme = (unsigned short *) 0x90320000; 


int k,1; 
long tempi; 
unsigned char tbuf; 


int t_alloc; 
int p_alloc; 
int r alloc; 


unsigned char 
unsigned char 
unsigned char 
unsigned char 

unsigned char 


* 

tbuf_pp0 

= (unsigned char 

*) 

0x010005fc; 

* 

tbuf_ppl 

= (unsigned char 

*) 

OxOlOOISfc; 

★ 

tbuf_^pp2 

= (unsigned char 

*> 

0x010025fc; 

★ 

tbuf_j?p3 

= (unsigned char 

*> 

0x010035fc; 

min first; 





unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 


char ppnexttask[4] ; 

short ppalloc[4]; 

char currentjblock; 

int table_pointer = 0x80380000; 

int table_address [4] ; 

short table_size [4 J ; 

char pprequesting; 

char ppinfo[4]; 

char ppdone; 


unsigned char *pp_stop_encode = (unsigned char *) 0x010007D4; 
unsigned char DONE1; 


UINT32 temp_ulong; 

Semaphore = 0; 
comp_table[0] = 10; 
comp_table[1] = 20; 
comp_table[2] =40; 
comp_table[3] = 80; 
computable[4] = 100; 
last_comp = 2; 

compression__ratio = (UINT32 *) 0x010107D0; 

stream_addr[0] = 0x90023000; 
stream_addr [1] = 0x90028000; 

NOCACHE_INT (numberjpixels[0]) = 0x80; 

NOCACHE_INT(global_mean) = 0x5e; 

* pp_s t °p_e n c o de = 0; 

/* initialize FIRST variable on all PPs - used for first pass # of bitplanes to process 
*/ 

tbuf^pp0[0] = 4; 
tbufjppl[0] = 4; 
tbuf_pp2[0] - 4; 
tbuf_pp3[0] = 4; 

#ifdef HOST 

RetumVal = CilSigHandler (SignalHandler) ; 
if (ReturnVal != CIL__OK) 

{ 

/*** Cannot register signal handler **★/ 
whiled) ; 
return; 

} 

#endif 


B-47 





NAWCWD TP 8442 


dx = 512; /* size of ROI which will be initialised*/ 

dy = 480; /* if too big, may not run at frame rate*/ 


pVil24 - Vil240pen() ; /* Open VIL module */ 

if ( pVil24 == NULL ) exit(0) ; /* Failed to open VIL module */ 

Vil24Initialize(pVil24 / VIL24_EIA_DEFAULT) ; /* set up for CCIR camera */ 

Vil24SetVcrMode (pVil24,1) ; /* ensure lock to poor sources */ 

Vil24SetROI (pvil24,64,0,dx,dy) ; /* set ROI in the center */ 

vim as iclCreateHdr (1,1, ICL_IMG__CUSTOM) ; /* create image header structure ... */ 

Vil24InitImgHdr (pVil24,vim) ; /* ... and initialise to describe VIL */ 

VolSetDisplay (VOL_VGA8_1024) ; /* setup colour display for VIM-8 */ 

VolSetGreyLUT() ; 

p[0] = (PTREQ *) (MP_PARM_RAM + Ox2cO) ; 

PEI] = p[0] + 1; 

p[2] = p[0] + 2; 

P [3] = p[0] + 3; 

P[4] - P[03 + 4; 

P[5] = p [03 + 5; 

p[6] - P[0] + 6; 

p[7] = p[0] + 7; 

P[83 - pCO] + 8; 
p[9] = p [03 + 9; 

/* Set MP list pointer to point to first PT */ 
ptr = (long *) (MP_PTREQ_PTR) ; 

*ptr = (long) p[0]; 

/* Fifo Bank 0 -> DRAM (8 bit data)*/ 


p[0] ->link = p[03; /* point to this PT */ 
P[03 ->word[0] = 0x80000000; /* Weird Fifo to linear */ 
p [03->word[l] = OxaOOOOOOl; /* Src address is VIM pixel fifo */ 
p[0]->word[2] = 0x80300000; /* Dst address is DRAM */ 
p[03->word[3] = OxOlffOOOl; /* Src B count Src A count */ 
p [0] ->word[4] = OxOOef0200; /* Dst B count Dst A count */ 
p[0] ->word[5] = Oxef; /* Src C count */ 
p[0] ->word[6] - 0; /* Dst C count */ 
P [0] ->word [7] = 0x08; /* Src B pitch */ 
p[03 ->word[8 3 = 0x200; /* Dst B pitch */ 
p[0] ->word[93 = 0x400; /* Src C pitch */ 
p[0)->word[103 =0; /* Dst C pitch */ 
p[03->word[ll] = 0; /* Src transparency upper */ 
p[03 ->word[123 = 0; /* Src transparency lower */ 
p[03 ->word[13] = 0; /* Reserved */ 
p[03 ->word[ 14] = 0; /* Reserved */ 

/* SDRAM -> internal coefficient blocks (16 bit data) (32x16 words)*/ 

p[l]->link = p[1] ; /* point to this PT */ 
p[l] ->word[0] = 0x80000000; /* Contig. Mem to int no update */ 
p[13->word[l] = 0x80320000; /* Src address is SDRAM */ 
p[13->word[2] = 0x00008000; /* Dst address is internal */ 
p(13->word[3] = 0x000f0040; /* Src B count Src A count */ 
p[l]->word[43 = 0x00000400; /* Dst B count Dst A count */ 
p[13->word[5] = 0x00; /* Src C count */ 
p[l]->word[6] =0; /* Dst C count */ 
p [13 ->word [7] = 0x0400; /* Src B pitch */ 
p [13->word[ 8 ] = 0x000; /* Dst B pitch */ 
p[l]->word[93 = 0x40; /* Src C pitch */ 
p[l] ->word[103 = 0x1000; /* Dst C pitch */ 
p[l] ->word[ll] = 0; /* Src transparency upper */ 
p[13 ->word[12] = 0; /* Src transparency lower */ 
p[l]->word[133 “ 0; /* Reserved */ 
p[13->word[14] = 0; /* Reserved */ 
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/* SDRAM (8 bit data) -> VRAM (raw image) */ 


p[2]->link = p[2]; 

/* 

point to this PT 

*/ 

p[2]->word[03 - 0x80000000; 

/* 

linear to VRAM 

*/ 

p[2]->word[13 = 0x80300000; 

/* 

Src 

address is SDRAM 

*/ 

p[2]->word[23 = 0xb4000000; 

/* 

Dst 

address is VRAM 

*/ 

p[2]->word[3] = 0x00038000; 

/* 

Src 

B count Src A count 

*/ 

p[2]->word[4] = OxOOff0200; 

/* 

Dst 

B count Dst A count 

*/ 

p[2]->word[5] * 0x00; 

/* 

Src 

C count 

*/ 

p[23->word[63 = 0; 

/* 

DSt 

C count 

*/ 

p[23->word[73 = 0x8000; 

/* 

Src 

B pitch 

*/ 

p[2]->word[83 = 0x800; 

/* 

Dst 

B pitch 

*/ 

p[2]~>word[9] = 0x00; 

/* 

Src 

C pitch 

*/ 

p[23->word[10) = 0x0000; 

/* 

Dst 

C pitch 

*/ 

P[2]->word[11] = 0; 

/* 

Src 

transparency upper 

*/ 

P[2]->word[12] = 0; 

/* 

Src 

transparency lower 

*/ 

p[2]->word[13] = 0; 

/* 

Reserved 

*/ 

p[2]->word[14] = 0; 

/* 

Reserved 

*/ 


/* DRAM (rows and 8 bit data) -> internal RAM */ 

/* Can be used 4 times then change dst <- this can be done 8 times */ 


p[3]->link = p[3]; 

/* 

point to this PT 

*/ 

p[3]->word[0] = 

0x80000202; 

/* 

Contig. Mem 

to int w/s update 

*/ 

p[3]->word[l) = 

0x80300000; 

/* 

Src 

address 

is DRAM 

*/ 

P[3]->word[2) = 

0x00000000; 

/* 

Dst 

address 

is internal 

*/ 

P[3]->word[3] = 

0x00000800; 

/* 

Src 

B count 

Src A count 

*/ 

p[3]->word[4] = 

0x00000800; 

/* 

Dst 

B count 

Dst A count 

*/ 

p[3]->word[5] = 

0x00; 

/* 

Src 

C count 


*/ 

P[3]->word[6] = 

0; 

/* 

Dst 

C count 


*/ 

p[3]->word[7) = 

0x0000; 

/* 

Src 

B pitch 


*/ 

p[3]->word[8] = 

0x000; 

/* 

Dst 

B pitch 


*/ 

p[3]->word[9] = 

0x0800; 

/* 

Src 

C pitch 


*/ 

p[3]->word[10] = 

= 0x1000; 

/* 

Dst 

C pitch 


*/ 

p[3]->word[ll] = 

» 0; 

/* 

Src 

transparency upper 

*/ 

p[3]->word[12) = 

0; 

/* 

Src 

transparency lower 

*/ 

P[3]->word[13] = 

0; 

/* 

Reserved 


*/ 

p[3]->word[14] = 

0; 

/* 

Reserved 


*/ 


/* DRAM (columns and 16 bit data) -> internal RAM */ 

/* Can be used 4 times then change dst <- this can be done 8 times */ 


p[4]->link = p[4] ; /* point to this PT */ 
p[4]->word[0] - 0x80000002; /* Contig. Mem to int w/dst updt */ 
p[4)->word[l] = 0x80320000; /* Src address is DRAM */ 
p[4] ->word[2] ss 0x00000000; /* Dst address is internal */ 
p[4)->word[3] = OxOOff0010; /* Src B count Src A count */ 
p[4]->word[4] = 0x00001000; /* Dst B count Dst A count */ 
p[4]~>word[5] = 0x00; /* Src C count */ 
PC4]->word[6] =0; /* Dst C count */ 
p[4]->word[7] = 0x0400; /* Src B pitch */ 
p[4]->word[8] * 0x000; /* Dst B pitch */ 
p[4]->word[9] = 0x10; /* Src C pitch */ 
p[4]->word[10] = 0x1000; /* Dst C pitch */ 
P[4]->word[ll] =0; /* Src transparency upper */ 
p[4]->word[12] =0; /* Src transparency lower */ 
p[4]->word[13] =0; /* Reserved */ 
p[4]->word[14] =0; /* Reserved */ 


/* internal RAM -> DRAM (columns and 16 bit data) */ 

/* Cain be used 4 times then change src then this cam be repeated 8 times */ 


p[5]->link = p[5] ; /* point to this PT */ 
p[5]->word[0] = 0x80000200; /* int to Contig. Mem w/src updt */ 
p[5]->word[l] = 0x00000000; /* Src address is internal */ 
p[5]->word[2] = 0x80320000; /* Dst address is DRAM */ 
p[5]->word[3] ~ 0x00001000; /* Src B covmt Src A count */ 
p[5]->word[4] = OxOOff0010; /* Dst B count Dst A count */ 
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P[5]->word[5] 

= 0x00; 

/* 

Src C count 

*/ 

p[53->word[6] 

- 0; 

/* 

Dst C count 

*/ 

p [53->word[7] 

= 0x0000; 

/* 

Src B pitch 

*/ 

p[5]->word[8] 

= 0x400; 

/* 

Dst B pitch 

*/ 

p[5]->word[9] 

= 0x1000; 

/* 

Src C pitch 

*/ 

p[5]->word[103 

= 0x0010; 

/* 

Dst C pitch 

*/ 

p[53->word[113 

= 0; 

/* 

Src transparency upper 

*/ 

p[53->word[12] 

= 0; 

/* 

Src transparency lower 

*/ 

p[53->word[13] 

= 0; 

/* 

Reserved 

*/ 

p[53->word[14] 

= 0; 

/* 

Reserved 

*/ 


/* DRAM (rows and 16 bit data) -> internal RAM */ 

/* Can be used 4 times then change dst then this can be repeated 8 times */ 


p[6]->link * p[6j; /* point to this PT */ 

p[6]->word[0] = 0x80000002; /* Contig. Mem to int w/dst updt */ 

p[6]->word[l] - 0x80320000; /* Src address is DRAM */ 

p[6]->word[2] = 0x00000000; /* Dst address is internal */ 

p[6]->word[3] *■ 0x00001000; /* Src B count Src A count */ 

p[6]->word[4] = 0x00001000; /* Dst B count Dst A count */ 

P[6]->word[5] = 0x00; /* Src C count */ 

p[6]->word[6] =0; /* Dst C count */ 

p[6]->word[7] = 0x0000; /* Src B pitch */ 

p[6]->word[8] = 0x000; /* Dst B pitch */ 

p[6] ->word[9] = 0x0000; /* Src C pitch */ 

p[6]->word[10] = 0x1000; /* Dst C pitch */ 

p[6]->word[ll] =0; /* Src transparency upper */ 

p[6]->word[12] = 0; /* Src transparency lower */ 

P[6]->word[13] =0; /* Reserved */ 

p[6]->word[14] =0; /* Reserved */ 

/* internal RAM -> DRAM (rows and 16 bit data) */ 

/* Can be used 4 times then change src then this can be repeated 8 times */ 

p[7]->link * p[7]; /* point to this PT */ 

p [7] ->word[0] = 0x80000200; /* int to DRAM w/src updt */ 

P[7]->word[l] = 0x00000000; /* Src address is internal */ 

p[7]->word[2] = 0x80320000; /* Dst address is DRAM */ 

p [7] ->word[3] as 0x00001000; /* Src B count Src A count - */ 

p[7]->word[4] = 0x00001000; /* Dst B count Dst A count */ 

p[7]->word[5] = 0x00; /* Src C count */ 

P[7]->word[6] =0; /* Dst C count */ 

p[7]->word[7] * 0x0000; /* Src B pitch */ 

p[7]->word[8] « 0x000; /* Dst B pitch */ 

p[7] ->word[93 = 0x1000; •/* Src C pitch */ 

p[73->word[103 = 0x0000; /* Dst C pitch */ 

p[7]->word[113 = 0; /* Src transparency upper */ 

* p(73 ->word[123 “ 0; /* Src transparency lower */ 

p[73->word[13] - 0; /* Reserved */ 

p[7]->word[143 =0; /* Reserved */ 

/* internal RAM -> VRAM (rows and 8 bit data) */ 

/* Can be used 4 times then change src then this can be repeated 8 times */ 

p[8]->link = p[83; /* point to this PT */ 

p[8] ->word[0 3 = 0x80000202; /* int to VRAM w/dst updt */ 

p[83->word[l] = 0x00000000; /* Src address is internal */ 

p[83 ->word[23 = 0xb4000200; /* Dst address is VRAM */ 

p[83->word[33 = 0x00000800; /* Src B count Src A count */ 

p[83->word[4 3 = 0x00030200; /* Dst B count Dst A count */ 

p[83 ->word[53 « 0x00; /* Src C count */ 

p[8] ->word[6] =0; /* Dst C count */ 

p[83->word[73 » 0x0000; /* Src B pitch */ 

p[83->word[8] = 0x400; /* Dst B pitch */ 

p[8]->word[9] = 0x1000; /* Src C pitch */ 

P[8]->word[103 = 0x1000; /* Dst C pitch */ 

p[83->word[ll] ~ 0; /* Src transparency upper */ 

p[8] ->word[12] =0; /* Src transparency lower */ 

p[83->word[133 =0; /* Reserved */ 
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p[8]->word[14] =0; /* Reserved */ 

/* internal RAM -> SDRAM (byte stream 8 bit data) */ 

p[9]->link = p[9]; /* point to this PT */ 

p[9] ->word[0] = 0x80000202; /* int to VRAM w/dst updt */ 

p[9]->word[l] = 0x01000630; /* Src address is internal */ 

p[9]->word[2) = 0x90023000; /* Dst address is Shared DRAM */ 

p[9]->word[3] =0; /* Src B count Src A count */ 

p[9]->word[4] =0; /* Dst B count Dst A count */ 

p[9]->word[5] = 0x00; /* Src C count */ 

p[93->word[6] =0; /* Dst C count */ 

p[9]->word[7] = 0x0000; /* Src B pitch */ 

p[9]->word[8] = 0x000; /* Dst B pitch */ 

pC9]->word[9) = 0x1000; /* Src C pitch */ 

p[9]->word[10] = 0; /* Dst C pitch */ 

p [9]->word(ll] =0; /* Src transparency upper */ 

p[9]->word[12) =0; /* Src transparency lower */ 

p[9]->word[13] =0; /* Reserved */ 

p[9]->word[14] * 0; /* Reserved */ 

Vil24SetSequenceMode(pVil24,1) ; /* set VIL to sequence mode */ 

Vil24StartCapture(pVil24); /* start capturing images */ 


IE = 0x01; 

command(0x200000Of); /* unhalt PP0 */ 

vert_loop_index [ 0 3 = 16; 
ve r t_l oop_index [1] = 4; 
ver t_loop_index[2] = 1; 
vert_loop_index[3] = 1; 
vert_l oop_index[4 3 = 1; 

horiz_loop_index [0] = 15; 
horiz_loop_index[13 - 5 ; 
horiz_loop_index [2] = 1; 
horiz_loop_index [3] = 1; 
horiz_loop_index[4 3 = 1 ; 

jump_col[0] = 16; 
jump_col[1] = 64; 
jump__col [2 ] = 256; 
jump_col[3] = 256; 
jump_col[43 = 256; 

jump__row[0] * 0x1000; /* not used */ 

jump_row[l] = 0x3000; 
jump_row[2] = OxFOOO; 
jump_row[3] = 0x10000; 
jump__row [4] = 0x10000; 

/* Zero out unused lines in the input (lines 240-255) */ 

unused_pixels = (long *) 0x8031e000; 

for (i=0;i<2048;i++) 

NOCACHE_INT (unused_j>ixels [i3) = 0 ; 

/* Initialize the adaptive bit allocation tables on the PPs */ 

init_alloc_table(2); 

Vil24SetReadBank(pVil24,0); /* prepare to read 1st field (even lines) */ 

/***★***************************************************************************/ 

while (1) 

{ 
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#ifdef CAMERA 

Vil24WaitForData(pVil24); /* wait for image data */ 

Vil24SetReadBank (pvil24,0) ; /* prepare to read 1st field */ 

#endif 

encode_time = Oxffffffff - TCOUNT; 

TCOUNT « Oxffffffff; 

#ifdef CAMERA 

*ptr = (long) p[0]; 

/* kick off TC to transfer pixel info to SDRAM */ 

PKTREQ |= MP_PKTREQ__P_BIT; i = 0; while (PKTREQ & 0x02); 

#endif 

/* clear the interrupt flag */ 

INTPEN = OxfOOOO; 

#ifdef HOST 

/* Read Compression Ratio */ 

CilReadMailbox(1, (PUINT32) compression_ratio) ; 

if (*compression_ratio > 4) *compression_ratio = 2; 
if (*compression__ratio < 0) *compression_ratio - 2; 

if (*compression_j:atio != last_comp) 

{ 

last_comp = *compression_ratio; 
init_alloc_table (last__comp) ; 

) 

#endif 

/* Display the incoming raw data (doubling the rows) */ 
p[2]->word[2] = 0xb4000000; 

*ptr - (long) p[2]; 

/* kick off TC to transfer pixel from SDRAM to VRAM */ 

PKTREQ 1= MP_PKTREQ_P_BIT; i = 0; while (PKTREQ & 0x02); 

p[2]->word[2] = 0xb4000400; 

/* kick off TC to transfer pixel from SDRAM to VRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = 0; while (PKTREQ & 0x02); 

/* Encode coefficients of the raw image */ 

#ifdef ENCODE 

/* TCOUNT = Oxffffffff; */ 

p[3]->word[1] = 0x80300000; /* Src address is SDRAM */ 

for (lp=0; lp<5; lp++) 


/* Do 32 rows */ 

for (i=0; i<horiz__loop__index [lp] ; i++) 
{ 

if (i==0) 

{ 

p[6]->word[l] = 0x80320000; 
p[7]->word(2] = 0x80320000; 


/* Src address is SDRAM 
/* Dst address is SDRAM 


*/ 

*/ 
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) 

p[3] ->word[2] = 0x00000000; /* Dst address is internal */ 

p[6] ->word[2] = 0x00000000; /* Dst address is internal */ 

p [7]->word[l] = 0x00000000; /* Src address is internal */ 

if <lp==0) 

*ptr « (long) p[3]; 
else 

*ptr = (long) p[6); 

if (i==0) 

{ 

switch (Ip) ( 
case 0: 

{ 

/* p[6] never used here to move coefficients in the first time, */ 
/* p[3] is used instead */ 
p[6]->word[3] = 0x00001000; 

p[6]->word[4] = 0x00001000; 

p[6]->word[5] = 0x00000000; 

P[6]->word[7] = 0x00000000; 

p[6]->word[9] = 0x00000000; 

p[7]->word[3] = 0x00001000; 

p[7]->word[4] = 0x00001000; 

p[7]->word[6] = 0x00000000; 

p[7]->word[8] « 0x00000000; 

p[7]->word[10] = 0x00000000; 
break; 

} 

case 1: 

{ 

p(6]->word[3] = 0x00ff0002; 

P[6]->word[4] = OxOOOOOcOO; 

P[6]->word[5] = 0x00000005; 

p[6]->word[7] = 0x00000004; 

p[6]->word[9] = 0x00000800; 

p[7]->word[3] = OxOOOOOcOO; 

p[7]->word[4] = OxOOff0002; 

P[7]->word[6] = 0x00000005; 

p[7]->word[8] * 0x00000004; 

p[73->word[10] * 0x00000800; 
break; 

} 

case 2: 

( 

P[63->word[3] - 0x007f0002; 

p[6]->word[43 = OxOOOOOfOO; 

p[63->word[5] = OxOOOOOOOe; 

p[63->word[73 = 0x00000008; 

p[6]->word[93 = 0x00001000; 

p[73->word[33 = OxOOOOOfOO; 

p[73->word[43 = 0x007f0002; 

p[73->word[6] = OxOOOOOOOe; 

p[7 3 ~>word[8] = 0x00000008; 

p[73->word[103 = 0x00001000; 
break; 

} 

case 3: 

{ 

p[63->word[33 = 0x003f0002; 

p[63 ~>word[4] = 0x00000400; 

p[63->word[53 = 0x00000007; 

P[63 ~>word[7] = 0x00000010; 

P[63->word[9] - 0x00002000; 
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p[7]->word[3] 
p[7]->word[4] 
p[7] ->word[6] 
P[7]->word[8] 
p[7]->word[10] 
break; 

} 

case 4: 

{ 

p[6]->word[3] 
p[6]->word[4] 
p[6]->word[5] 
p [6] “->word[7] 
p[6]->word[9] 


* 0x00000400; 
= 0x003f0002; 
= 0x00000007; 
= 0x00000010; 
= 0x00002000; 


= OxOOlf0002; 
= 0x00000100; 
= 0x00000003; 
= 0x00000020; 
= 0x00004000; 


p[7]->word[3] 
p[7]->word[4] 
p[7]->word[6] 
p[7]->word[8] 
p[7]->word[10] 
break; 

} 

default: 
break; 

} 

} 


0x00000100; 
OxOOlf0002; 
0x00000003; 
0x00000020; 
0x00004000; 


PKTREQ |= MP__PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

command(0x00002001) ; /* send msg interrupt to PP0 */ 

p[6]->word[1] = p[6]->word[l] + jump_row[lp]; 

PKTREQ |* MP_PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal */ 

i - i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

command(0x00002002) ; /* send msg interrupt to PPl */ 

p[6]->word[l] = p[6]->word[l] + jump_row[lp]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal */ 

i » i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

command (0x00002004) ; /* send msg interrupt to PP2 */ 

p[6]->word[l] = p [6]->word[l] + jump__row[lp] ; 

PKTREQ |= MP_PKTREQ_P__BIT; /* kick off transfer from SDRAM to internal */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

command(0x00002008) ; /* send msg interrupt to PP3 */ 


B-54 




NAWCWD TP 8442 


p[6] ->word[l] = p [6]->word[l] + jump_row[lp] ; 

while < (INTPEN & 0x10000) ==0x00) ; /* poll PPO */ 

INTPEN « 0x10000; /* clear the interrupt flag */ 

*ptr = (long) p[7]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to DRAM */ 

i = i + 1; 

i = i - 1; 

while (PKTREQ & 0x02); 

p [7]->word[2] = p[7]->word[2] + jump_row[lp] ; 

while( (INTPEN & 0x20000)==0x00) ; /* poll PP1 */ 

INTPEN = 0x20000; /* clear the interrupt flag */ 

PKTREQ |= MP_PKTREQ_P__BIT; /* start xfer from internal to DRAM */ 

i - i + 1; 

i = i - 1; 

while(PKTREQ & 0x02); 

P[7]->word [2] = p[7]->word[2] + jump_row[lp]; 

while ((INTPEN & 0x40000) ==0x00) ; /* poll PP2 */ 

INTPEN = 0x40000; /* clear the interrupt flag */ 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to DRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

p[7]->word[2] = p[7]->word[2] + jump_row[lp] ; 

while( (INTPEN & 0x80000)==0x00) ; /* poll PP3 */ 

INTPEN = 0x80000; /* clear the interrupt flag */ 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to DRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

p[7]->word[2] = p [7]->word[2] + jump_row[lp] ; 

} /* end for i */ 

NOCACHE_INT (global_mean) = (NOCACHE_INT (mean^pp [0]) + NOCACHE__INT (mean_pp [ 1 ]) + 
NOCACHE__INT(mean_pp[2 3) + NOCACHEJCNT (mean_pp [3]) ) / (512*240); 

/* Do the appropriate number of columns depending on the scale */ 

if (lp»=4) 

for (i=0; i<vert_loop_index [lp] ; i++) 

{ 

if (i==0) 

{ 

p[4] ->word[l] = 0x80320000; /* Src address is DRAM */ 

p[5]->word[2] = 0x80320000; /* Dst address is DRAM */ 

} 
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P[4]->word[2] = 0x00000000; 


/* Dst address is internal 


*ptr = (long) p[4]; 


if <i~=0) 

{ 

switch (lp) ( 
case 0: 

{ 

p[4]->word[3] 
p[4]->word[4] 
P[4 3->word[5] 
p[4]->word[7] 
p[4]->word[9] 


OxOOefOOlO; 
OxOOOOOf00; 
0x00000000; 
0x00000400; 
0x00000000; 


p[5]->word[3] 
P [5]->word[4] 
p[5]->word[6] 

P[5]->word[8] 
p[5]->word[10] 
break; 

} 

case 1: 

{ 

p[4]->word[3] 
p[4]->word[4] 
p[4 3“>word[5] 
p[4]->word[7] 
p[4]->word[9] 

p[5]->word[3] 
p[5]->word[4] 
p[5]->word[6] 
p[5]->word[8] 
p[5]->word[10] 
break; 

} 

case 2: 

{ 

p[43->word[33 
p[4 3->word[43 
p[4]->word[5] 
p[4]->word[7] 

p[4 3 ->word[93 


= OxOOOOOfOO; 
= OxOOefOOlO; 
= 0x00000000; 
= 0x00000400; 
= 0x00000000; 


= OxOOOf0002; 
= OxOOOOOfOO; 
= 0x00000077; 
= 0x00000004; 
= 0x00000800; 

• OxOOOOOfOO; 
= OxOOOf0002; 
= 0x00000077; 
= 0x00000004; 
= 0x00000800; 


= 0x001f0002; 
* OxOOOOOfOO; 
= 0x0000003b; 
= 0x00000008; 
= 0x00001000; 


p[5]->word[3] 
p[53->word[43 
p[53->word[6] 
P[5]->word[83 
p[53->word[103 
break; 

} 

case 3: 

{ 

p[4]->word[33 
p[43->word[43 
p[43->word[53 
p[43->word[7] 
p[43->word[93 


= OxOOOOOfOO; 
= OxOOlf0002; 
= 0x0000003b; 
= 0x00000008; 
= 0x00001000; 


= OxOOOf0002; 
= 0x000003c0; 
* OxOOOOOOld; 
= 0x00000010; 
= 0x00002000; 


p[53->word[33 
p[53->word[43 
p[53->word[6] 
p[53->word[83 
p[5]->word[103 
break; 

} 

default: 


= 0x000003c0; 

* OxOOOf0002; 
= OxOOOOOOld; 

* 0x00000010; 
= 0x00002000; 


*/ 
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break; 

> 

} 

/* kick off TC to transfer columns of pixels from SDRAM to internal */ 

PKTREQ |= MP__PKTREQ_P_BIT; 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

command (0x00002001) ; /* send msg interrupt to PP0 */ 

p[4]->word[l] = p [4]->word[l] + jump_col [lp]; /* Src address is SDRAM 

/* kick off TC to transfer columns of pixels from SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_PJBIT; 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

command (0x00002002) ; /* send msg interrupt to PPl */ 

p[4]->word[l] = p[4]->word[l] + jump^col[lp]; /* Src address is DRAM */ 

/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ |= MP_PKTRE Q_P_B IT ; 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

command(0x00002004) ; /* send msg interrupt to PP2 */ 

p[4] ->word[1] = p [4] ~>word[l] + jump_col [lp] ; /* Src address is DRAM */ 

/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_B IT; 

i-i+1; 
i = i ■ 1; 

while(PKTREQ & 0x02); 

command(0x00002008) ; /* send msg interrupt to PP3 */ 

p[4]->word[l] = p[4]->word[l] + jump_col[lp]; /* Src address is DRAM */ 

p[5]->word[l] = 0x00000000; /* Src address is internal */ 

*ptr a (long) p[5]; 

while((INTPEN & 0x10000)==0x00); /* poll PP0 */ 

INTPEN = 0x10000; /* clear the interrupt flag */ 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from internal to DRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

p[5] ->word[2] = p[5]->word[2] + jump_col [lp] ;/* Dst address is DRAM */ 

while( (INTPEN & 0x20000)=0x00) ; /* poll PPl */ 
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INTPEN - 0x20000; /* clear the interrupt flag */ 

PKTREQ |= MP_PKTREQ_PJBIT; /* kick off transfer from internal to DRAM +/ 

i * i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

p[5] ->word[2] = p [5] ->word[2] + jump_col[lp];/* Dst address is DRAM */ 

while((INTPEN & 0x40000)==0x00); /* poll PP2 */ 

INTPEN = 0x40000; /* clear the interrupt flag */ 

PKTREQ |= MP__PKTREQ_P_BIT; /* kick off transfer from internal to DRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

p[5] ->word[2] = p [5]->word[2] + jump_col [lp] ;/* Dst address is DRAM */ 

while((INTPEN & 0x80000)==0x00); /* poll PP3 */ 

INTPEN = 0x80000; /* clear the interrupt flag */ 

PKTREQ |= MP_PKTREQ__P_BIT; /* kick off transfer from internal to DRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

p[5]->word[2] = p [5]->word[2] + jump_col [lp] ;/* Dst address is DRAM */ 


) 


} /* end for Ip */ 
#endif 


*pp_stop_encode = 0; 


#ifdef ENCODE STREAM 


/* Send over coefficients for maxval calculation */ 


p[63->word[l] 
p[6]->word[3] 
p[6]->word[4] 
p[6]->word[5] 
p[6]->word[7] 
p[6]->word[9] 


= 0x80320000; 

= 0x001f0002; 
= 0x00000100; 
= 0x00000003; 
= 0x00000020; 
= 0x00004000; 


/* Src address is SDRAM 


*/ 


*ptr = (long) p[6]; 

p[6]->word[2] - 0x00000000; /* Dst address is internal */ 


/* kick off 128 word transfer from SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_B IT; i = i + l; i = i-l; while (PKTREQ & 0x02) ; 

command(0x00002001) ; /* send msg interrupt to PP0 */ 


p[6]->word[l] = p[6]->word[l] + jump_row[4] ; 

/* kick off 128 word transfer from SDRAM to internal */ 

PKTREQ |= MP_PKTREQJ?_BIT ; i = i + l; i = i- l; while (PKTREQ & 0x02); 
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command(0x00002002) ; /* send msg interrupt to PPl */ 

p [6]->word[l] = p[6]->word[1] + jump_row[4]; 

/* kick off 128 word transfer from SDRAM to internal */ 

PKTREQ j= MP_PKTREQ_P_BIT; i*i+l;i«i-l; while(PKTREQ & 0x02); 

command(0x00002004) ; /* send msg interrupt to PP2 */ 

p [6]->word[l] - p[6]->word[l] + jump_row[4]; 

/* kick off 128 word transfer from SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i=i+l;i=i-l; while(PKTREQ & 0x02); 

command(0x00002008) ; /* send msg interrupt to PP3 */ 

while( (INTPEN & OxOFOOOQO!=0x0F0000); /* wait for PP0,1,2,3 */ 

INTPEN = OxFOOOO; /* clear the interrupt flag */ 

temp_maxval = NOCACHE_INT (local_maxval [0]) ; 

if (NOCACHE_INT(local_maxval [1]) > temp__maxval) temp_maxval = 

NOCACHE_INT (localjnaxval [1]) ; 

if (NOCACHE_INT (local_maxval [2]) > temp_maxval) temp_maxval = 

NOCAC HE_INT(localjmaxval[2]); 

if (NOCACHE_INT(local_maxval [3]) > temp_maxval) temp_maxval = 

NOCACHE_INT (local_maxval [3 ]) ; 

i=0; 

while(temp_maxval»=1) 

{ 

i++; 

terap_maxval = temp_maxval»l; 

} 

temp_maxval = l«i; 

NOCACHE_INT (global^maxval) = i; 

command(0x000020OF) ; /* send msg interrupt to PP0,1,2,3 *£ 

/*************************************************************************/ 
#ifdef HOST 

/* Tell host where bit stream can be found */ 

RetumVal = CilWriteMailbox (0 , stream_addr [flip]) ; 

if .(ReturnVal != CIL_OK) 

( 

/*** Cannot write to mailbox ***/ 
whiled) ; 
return; 

) 

#endif 

table_pointer = stream_addr[flip] + 8; 

ppnexttask[0] = 0; 
ppnexttask[1] = 0; 
ppnexttask[2] - 0; 
ppnexttask[3] = 0; 

current_block = 0; 
ppdone = 0; 

while (ppdone!-4) 

{ 

if ((INTPEN & OxFOOOO)!=0x00) 
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DONEl = 0; 

/* find out which PP requested service */ 

for (pprequesting=0 ;pprequesting<4 ;pprequesting++) 

< 

if ((INTPEN & (1«(16+pprequesting))) != 0) break; 

} 

/* clear the interrupt flag */ 

INTPEN = 0x100 00«ppreques ting; 

/* Move bitstream from internal to SDRAM, and move first block pair of coefficients 
if necessary */ 

if ( (ppnexttask [pprequesting] — 3) && (!DONEl)) 

{ 

p[9]->word[l] = 0x01000630 + (pprequesting«12) ; /* Src address is internal 

Parameter RAM */ 

p [9] ->word [2] = table_address [pprequesting]; /* Dst address is external 

SDRAM */ 

*ptr = (long) p[93; 

/* Read the number of bytes that need to be transferred */ 

p[9]->word[3] « p [9] ->word[4] = p[9]->word[10] = table_size [pprequesting] ; 

/* kick off transfer from internal to SDRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + l;i = i-l; while (PKTREQ & 0x02); 

if (current_block != 40) 

{ 

p[l]->word[l] = 0x80320000 + (current_block»3) *0x04000 + (current_block&7) *64; 
/* Src for block changes */ 

P [1] ->word[2] - 0x00008000 + (pprequesting«12) ; /* Dst address is 

internal RAM2 */ 

*ptr = (long) p[l]; 

/* kick off 1024 byte (32x16 words) coefficient block transfer from DRAM to 
internal */ 

PKTREQ |= MP_PKTREQ_P__BIT; i = i + l; i = i - 1; while (PKTREQ & 0x02); 

p[1]->word[l] = p[1]->word[1] + 0x200; /* Src for block changes */ 

p[l]->word[2] = p[l]~>word[2] + 0x400; /* Dst address is internal RAM2 


/* kick off 1024 byte (32x16 words) coefficient block transfer from DRAM to 
internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i « i - 1; while (PKTREQ & 0x02) ; 

/* write out TBYTES value for this block to the appropriate PP */ 

table_size[pprequesting] = local_alloc[current_block]; 

* (unsigned short *) (0x01000600 + (pprequesting«l2)) = 
table_size [pprequesting] ; 

command(0x00002000 + (l«pprequesting) ) ; /* send msg interrupt to PP 

that requested service */ 

table_address [pprequesting] = table^pointer ; 
table_jpointer = table_j>ointer + table_size [pprequesting]; 
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ppinf o [ppreque sting] = currentjblock; 
ppnexttask[pprequesting] ** 1; 
current_block = current_block + 1; 

} 

else ppdone = ppdone +1; 

D0NE1 = 1; 

} 

/* Give PP second and third block pairs to process */ 

if (( (ppnexttask[pprequesting] > 0) && (ppnexttask[pprequesting] < 3)) 6& (!DONEl)) 

{ 

p[l]->word[l] = 0x80320000 + (ppinfo [pprequesting] »3) *0x04000 + 

(ppinfo [pprequesting] &7) *64 + 0xl4000*ppnexttask[pprequesting] ; /* Src for block 

changes */ 

p[l]->word[2] = 0x00008000 + (pprequesting«12) ; /* Dst address is 

internal RAM2 */ 

*ptr = (long) p[l] ; 

/* kick off 1024 byte (32x16 words) coefficient block transfer from DRAM to 
internal */ 

PKTREQ |= MP_JPKTRE Q_P_B IT; i=i+l;i=i-l; while(PKTREQ & 0x02) ; 

p[l]->word[l] = p[l]->word[l] + 0x200; /* Src for block changes */ 

p[l]->word[2] = p[l]->word[2] + 0x400; /* Dst address is internal RAM2 

*/ 

/* kick off 1024 byte (32x16 words) coefficient block transfer from DRAM to 
internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i=i+l;i=i-l; while(PKTREQ & 0x02); 

command (0x00002000 + (l«pprequesting)) ; /* send msg interrupt to PP 

that requested service */ 

ppnexttask [pprequesting] += 1; 

DONE1 « 1; 

} 


/* Give PP first block pair to process */ 

if ( (ppnexttask [pprequesting]' = 0) && (fDONEl)) 

{ 

p[l]->word[l] = 0x80320000 + (current_block»3) *0x04000 + (current_block&7) *64; 
/* Src for block changes */ * 

p [1]->word[2] = 0x00008000 + (pprequesting«12) ; /* Dst address is 

internal RAM2 */ 

*ptr = (long) p[l]; 

/* kick off 1024 byte (32x16 words) coefficient block transfer from DRAM to 
internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i«i+l;i*i-l; while(PKTREQ & 0x02); 

p[l]->word[l] = p[l]->word[l] + 0x200; /* Src for block changes */ 

p[l]->word[2] = p[l]->word[2] + 0x400; /* Dst address is internal RAM2 

*/ 


/* kick off 1024 byte (32x16 words) coefficient block transfer from DRAM to 
internal */ 

PKTREQ 1= MP_PKTREQ_P_BIT; i=i+l;i=i-l; while(PKTREQ & 0x02); 

/* write out TBYTES value for this block to the appropriate PP */ 
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table_size[pprequesting] = local_alloc [current_block] ; 

♦(unsigned short *) (0x01000600 + (pprequesting«12)) = table_size [pprequesting] ; 

command (0x00002000 + (l«pprequesting)) ; /* send msg interrupt to PP 

that requested service */ 

table_address [pprequesting] = table_pointer; 
table_j>ointer = table_pointer + table_size [pprequesting] ; 

ppinfo[pprequesting] = current_block; 
ppnexttask[pprequesting] * 1; 

current_block = current_block + 1; 

DONE1 = 1; 

} 


} 


} 

*pp_s top_encode = 1; 

command(Ox0000200F) ; /* send msg interrupt to all PPs */ 

#endif 

/* Write dynamic values to the bitstream */ 
temp_ulong = 0; 

temp_ulong * temp__ulong | (tbufjppO [0] « 24); 
temp_ulong = temp_ulong | (globaljnean « 16) ; 
temp^ulong = temp_ulong | (global_maxval « 8) ; 

NOCACHEJENT(* (UINT32 *) stream_addr [flip]) = temp__ulong; 

temp_ulong = *compression__ratio ; 

#ifndef variable_compression 

NOCACHE_INT (* (UINT32 *) (stream_addr[flip] + 4)) = temp_ulong & OxOOOOOOff; 
#else 

NOCACHE_INT(* (UINT32 *) (stream_addr [flip] + 4)) = temp_ulong; 

#endif 

/* Flip which buffer we are writing to */ 


flip « 1 - flip; 

/* Tell Host it can now get the latest bitstream */ 
#ifdef HOST 

RetumVal = CilRaiseSignalNumber(FrameDoneBit) ; 
if (ReturnVal != CILjOK) 

{ 

/*** Cannot raise signal ***/ 
return; 

} 

#endif 

/* Calculate and display latest frame timings */ 

proc_time = Oxffffffff - TCOUNT; 

#ifdef SHOWDISPLAY 

if (Semaphore — 0) 

{ 
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p[7]->word[3] 
p[7]->word[4] 
p[7]->word[6] 
p[7]->word[8] 
p[7]~>word[10] 


= 0x00001000; 
= 0x00001000; 
= 0x00000000; 
= 0x00000000; 
= 0x00001000; 


/* DRAM (8 bit data) -> VRAM (raw image) */ 


P[2]->link = p [ 2 3; 

/* 

point to this PT 

*/ 

P[2]->word[0] = 0x80000000; 

/* 

linear to VRAM 

*/ 

p[23->word[13 = 0x80300000; 

/* 

Src 

address is SDRAM 

*/ 

p[2 3 ~>word[23 = 0xb4000000; 

/* 

Dst 

address is VRAM 

*/ 

p[2]->word[3] = 0x00110010; 

/* 

Src 

B count Src A count 

*/ 

p[2]->word[43 = 0x00110010; 

/* 

Dst 

B count Dst A count 

*/ 

p[2]->word[53 = 0x00; 

/* 

Src 

C count 

*/ 

p[23->word[6] = 0; 

/* 

Dst 

C count 

*/ 

p[2]->word[73 = OxbO; 

/* 

Src 

B pitch 

*/ 

p[23->word[8] = 0x400; 

/* 

Dst 

B pitch 

*/ 

p[2]->word[9] = 0x00; 

/* 

Src 

C pitch 

*/ 

p[2]~>word[103 « 0x0000; 

/* 

Dst 

C pitch 

*/ 

p[23->word[ll] = 0; 

/* 

Src 

transparency upper 

*/ 

p[2]->word[123 = 0 ; 

/* 

Src 

transparency lower 

*/ 

p[2]->word[133 = 0; 

/* 

Reserved 

*/ 

p[2]->word[143 =0; 

/* 

Reserved 

*/ 


temp_time = 1.0/(encode_time*0.000000025); 

sprintf(stringltemp_time); 

for (i=0;i<4;i++) 

{ 

if (stringl[i]!='.’) 

{ 

tempostr = stringl[i]; 
temp_str - temp_str - 48; 
tempostr - temp_str * 16; 

} 

else 

{ 

tempostr = 160; 

) 

p[2]“>word[l] - (long)&nnmber^pixels; 
p[2]->word[l] += tempostr; 
p[2]->word[2] = 0xb4080000 + i*16; 

*ptr = (long) p[2]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from SDRAM to VRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

) 

temp^time = pro c_time *0.000025; 

sprintf(stringl,”%f",temp__time); 

for (i=0;i<4;i++) 

{ 

if (stringl[i]!='.’) 

{ 

temp_str = stringl[i3; 
temp_str = temp_str - 48; 
temp_str = temp_str * 16; 

} 

else 
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{ 

temp_str = 160; 

} 

p [2]->word[l] = (long)&number_pixels; 
p[2]->word[ 1 ] += temp_s tr; 
p[2]~>word[2] = 0xb4084800 + i*16; 

*ptr = (long) p[2]; 

PKTREQ |= MP_P KTRE Q_P_BIT; /* start xfer from SDRAM to VRAM */ 

i - i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

} 

/* Restore original p[2] PT array */ 

/* SDRAM (8 bit data) -> VRAM (raw image) */ 


p[2]->link = p[2] ; 

/* 

point to next PT 

*/ 

p[2]->word[0] = 0x80000000; 

/* 

linear to VRAM 

*/ 

p[2]->word[l] = 0x80300000; 

/* 

Src 

address is DRAM 

*/ 

p[2]->word[2] = 0xb4000000; 

/* 

DSt 

address is VRAM 

*/ 

p[2]->word[3] = 0x00038000; 

/* 

Src 

B count Src A count 

*/ 

p[23->word[4] = 0x00ff0200; 

/* 

Dst 

B count Dst A count 

*/ 

p[23->word[53 = 0x00; 

/* 

Src 

C count 

*/ 

p[2]->word[6] = 0; 

/* 

Dst 

C count 

*/ 

P[23 ~>word[73 * 0x8000; 

/* 

Src 

B pitch 

*/ 

p[23->word[8] = 0x800; 

/* 

Dst 

B pitch 

*/ 

p[23->word[93 = 0x00; 

/* 

Src 

C pitch 

*/ 

p[2]->word[10] * 0x0000; 

/* 

Dst 

C pitch 

*/ 

P[2]->word[11] - 0; 

/* 

Src 

transparency upper 

*/ 

p[23->word[123 = 0; 

/* 

Src 

transparency lower 

*/ 

p[23->word[13] =0; 

/* 

Reserved 

V 

p[23->word[143 = 0; 

/* 

Reserved 

*/ 


} 

#endif 

} /* end while */ 

) /* end task */ 

extern int ep_runpp0; 

main() 

{ 

int i; 

unsigned int temp; 

unsigned int *src_ptr = (unsigned int *)0x90080000; 
unsigned int *dst__ptr = (unsigned int *) 0x80000000 ; 

/* REFCNTL = Oxffff0138; */ /* setup up dram and sdram to correct refresh rate for 40 Mhz 
C80*/ 

REFCNTL = Oxffff0186; /* setup up dram and sdram to correct refresh rate for 50 Mhz C80*/ 


command(0xc00000Of) ; /* reset and halt PP0,1,2,3 */ 

* (int *) 0x010001b8 = (int) 6ep_runpp0; /* initialize task vector */ 

* (int *) 0x010011b8 = (int) &ep__runpp0; /* initialize task vector */ 

* (int *) 0x010021b8 = (int) &ep_runpp0; /* initialize task vector */ 

* (int *) 0x010031b8 = (int) &ep_runpp0; /* initialize task vector */ 


/* upload PP code */ 
for (i=0;i<34000;i++) 
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( 

temp = *src_ptr++; 

*dst_ptr++ = temp; 

} 

for (i=0;i<20000;i++) 

{ 

temp = temp + 1; 
temp = temp -1; 

) 

command(0x3000000f) ; /* start PPO,1,2,3 by unhalting it */ 

/* all will take its task interrupt*/ 


/* Basic init functions */ 

#ifdef HOST 

Interruptlnit() ; /* Init ME interrupts */ 

#endif 

/* Basic init functions */ 

PtReqlnitO; /* Init the ME PT functions */ 

TasklnitTaskingO ; /* Init tasking */ 

IclInstallPtdMalloc() ; /* Install protected malloc and free function to ME */ 
IclPTInit (15) ; /* Init the Icl PT server task with a priority of 15 */ 

#ifdef HOST 
/* 

** Initialise the Cil 

** Declare 4 buffers of 256 bytes each. 

** These buffers are not used here - choose minimum sizes. 

*/ 

Cillnit(4,256) ; 

#endif 

TaskResume (TaskCreate (-1, task, NULL, 14, 4096)); /* Start task */ 


while(l-=l); /* loop */ 

} 

/ft****************************************************************** 
★ * 

* Function : SignalHandler * 

* Args : UINT32 Signals * 

* * 

* Description: * 

* Signals Signals raised by host * 

* * 

* SignalHandler will be called when host raises signal * 

* * 

* Return Values: * 

* None * 

* * 


★a*****************************************************************/ 

void 

SignalHandler(UINT32 Signals) 

{ 

if ( (Signals & HostRequestBitMask) ! —0) 

{ 

/* 

** Host has requested a block of data 
*/ 

/* TaskSignalSema(Semaphore); */ 

Semaphore = 1; 

} 

/* 

** Ignore any other signals raised - they're not for us 
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*/ 


} 


/* Initialize the adaptive bit allocation tables on the PPs */ 
/***********************************************★*****************★* 


Function : init_alloc_table 
Args : UINT32 index 


Description: 

Signals 


index into the compression table 


init_alloc_table will be called when host changes the 
compression ratio 

Return Values: 

None 


*******************************************************************/ 


void 

init__alloc_table (UINT32 index) 

{ 

int t_alloc; 
int p_alloc; 
int r_alloc; 
int k,lp; 

t^alloc = YSIZE*XSIZE/comp_table[index] - headersize; 
p_alloc = t_alloc/ ( (AY*AX) / 6) ; 
r_alloc * t__alloc% ((AY*AX) /6) ; 

for(k=0;k<((AY*AX)/6);k++) 
local_alloc[k] = p_alloc; 

Ip *= 0; 

while (r_alloc>0) 

{ 

local_alloc[lp++] ++; 
r_alloc—; 

} 
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a*************************************************************************/ 


* * 

* Filename : pcic80.cmd * 

* * 

* Description: * 

* * 

* PCI/C80 linker file for the MP program * 

* ★ 


**************************************************************************/ 

-c 

-heap 0x100000 
-stack 0x10000 
-1 mp_cio.lib 

-I \pcic80\lib\mp_task.lib 
-1 mp_int.lib 
-1 mp_rts.lib 
-1 mp_ptreq.lib 
-1 mp_ppcmd.lib 
-1 ppcmd.lib 
-1 \pcic80\lib\icl.lib 
-1 \pcic80\lib\vol.lib 
-1 \pcic80\lib\bgl.lib 
-1 \pcic80\lib\vil24.lib 
-1 \pcic80\lib\cil.lib 

MEMORY 

< 

RAMO : o=0x00000000 1 = 0x00800 

RAMI : o=0x00000800 1 = 0x00800 

RAM2 : 0=0x00008000 1 = 0x00800 

RESERV : o=0x01000000 1 = 0x00200 

MPPRAM1 : o=0x01010580 1 = 0x00280 

MPPRAM : o=0x010007D8 1 = 0x00028 

SDRAM : o=0x80000000 1 = 0x300000 

DRAM : o=0x90000000 1 = 0x80000 

DRAM2: o=0x90080000 1 = 0x100000 

UNINIT : o=0x90180000 1 = 0x180000 

IMAGE : o=0x80300000 1 = 0x80000 

SPOT : o=0x80380000 1 = 0x10000 

VRAM__PAL : o=0xB0000000 1 = 0x200000 

VRAM_VGA : o=0xB4000000 1 = 0x400000 

} 

SECTIONS 

{ 

/* 

* The following section must be defined for all programs that 

* use the CIL. The section must appear in the first 8Mb 

* of DRAM and must be long enough to include all buffers 

* plus 128 bytes. This example is big enough for 4*256byte 

* buffers. 

* 

* See the user guide for more information. 

*/ 

.lsidram : { 

__CilDRAMBase = . ; 

. += 0x600; 

} > DRAM 

.text : > DRAM 

.cinit : > DRAM 

.const > DRAM 

.switch : > DRAM 

.data : > DRAM 

.bss : > DRAM 

.cio : > DRAM 

.pcinit : > DRAM 

.ptext : load > DRAM2, run SDRAM 

font : load > DRAM2, run SDRAM 
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. sysmem 
.stack 

sh_vars 

mp_vars 

rawimage 

stream 


> UNINIT 

> UNINIT 

> MPPRAM 

> MPPRAM1 

> IMAGE 

> SPOT 
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/★★a*********************************************************************** 


* * 

* Filename : pcic80a.cmd * 

★ * 

* Description: * 

* * 

* PCI/C80 linker file for the PP program * 

* * 


★★★★★★★★★★★★★★★★★A********************************************************/ 

“ pc 

-1 d:\ravp\src\newlib\pp_rts.lib /* New Library w/o ATEXIT extra variables */ 

-pstack 256 

MEMORY 

< 


RAMO : 

o=0x00000000 

1 

= 

0x00800 

RAMI : 

o=0x00000800 

1 

= 

0x00800 

RAM2 : 

o=0x00008000 

1 

= 

0x00800 

RESERV 

: o=0x01000000 


1 = 0x00200 

PRAM : 

o=0x01000200 

1 


0x00600 

SDRAM : 

o=0x80000000 

1 


0x800000 

DRAM : 

o-0x90400000 

1 

= 

0x400000 

VRAM_PAL : o=0xB0000000 

1 = 0x200000 

VRAM VGA : o=0xB4000000 

1 = 0x400000 


> 

SECTIONS 

{ 


.text : 

> 

DRAM 

. ptext 


> DRAM 

.cinit 


> DRAM 

. const 


> DRAM 

.switch 


> DRAM 

.data : 

> 

DRAM 

.bss 

> 

DRAM 

.cio : 

> 

DRAM 

. sysmem 

: 

> DRAM 

. stack 

: 

> DRAM 

.pcinit : 


> DRAM 

.pbss : 


(PASS) > PRAM 

. psysmem: 


(PASS) > PRAM 

.pstack : 


(PASS) > PRAM 


> 
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/***************************************************************************** 
* * 

** subpass.s (PP Program) 

* ★ 

** Written by Jim Witham, Code 472300D, 939-3599 

** 

** File that contains the following assembly language subroutine: 

*★ 

** $subpass 

** 

*****************************************************************************/ 


.global $subpass 

.global $stats_flag 
.global $char_to_index 
.global $stats_val 
.global $BITE 
.global $STOP 

.global $update_model 
.global $encode_symbol 

.global $sym_index 
.global $sym__array 
.global $do_syms 

.global $list 
.global $list_index 


/unsigned char 
/unsigned char 
;signed short 
;signed int 
/signed int 


pointer *(xba + $stats_flag) 
pointer &*(xba + $char_to_index) 
pointer * (xba + $stats__val) 
sw *(xba + $BITE) 
sw *(xba + $STOP) 


tempdl 

. set 

d4 

tempd2 

. set 

d2 

tempd3 

. set 

d5 

tempd4 

. set 

dl 

m 

. set 

d6 

tt 

. set 

d7 

list 

, set 

al2 

stats_val 

. set 

i 

set 

20 

j 

set 

24 

t 

set 

28 


.align 512 


/*************************★***************************************** 
* * 


* 

* 

★ 

★ 

* 

* 

* 

* 

★ 

* 

* 

* 

★ 


Function : $subpass 
Args : t 

Passed in : dl 

Description: 

t - the current THRESH » BITE value. 

$subpass will be called to perform a subordinate pass over 
the coefficients. 

Return Values: 
none 


* 

* 

* 

•* 

* 

★ 

* 

* 

* 

* 

* 

★ 

★ 


★★a****************************************************************/ 


$subpass: 

dO = &*(sp —= 32) 
*(sp + 12) =w iprs 

*(sp + 8) =w d6 
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I | * (sp + 16) =w a4 

*(sp + 4) =w al2 
|| *(sp + 0) —w d7 

* (sp + t) - dl 

tempdl —sw *(xba + $STOP) 
tempdl = tempdl - 0 
br =[ne] done 

tempdl =uh *(xba + $list_index) 

tempdl = tempdl - 0 

br = [eq] done 

nop 

nop 

mloop: 

list =uw *(xba + $list) 
stats_yal =uw *(xba + $stats_val) 

tempdl =uh *(xba + $list_index) /repeat loop list_index times 

*(sp + i) = zero ;i = 0 

leO — firstloop 
IrsO = tempdl - 1 
nop 

x8 = *(sp + i) 

tempdl = x8 + 1 
m =uh *(list + [x8]) 

II *(sp + i) = tempdl 

xO = m 

tempdl =uw *(xba + $BITE) 
lei = endjloop 
lrsl = tempdl - 1 

tt = * (sp + t) 
tt = tt »u 1 

jloop: 

aO = &*(pba + $sym_array) 

tempd2 =sh *(stats_val + [xO]) 

|| tempdl = tt 

tempd2 = tempd2 - 0 

tempd4 =[.nvz] 1 || tempd4 =[le.nvz] zero 

tempdl =[le] -tt 

tempd2 = tempd2 - tempdl 
*(stats_val + [xO]) =sh tempd2 

xO =uh * (pba + $sym_index) 
tempd2 = xO + 1 


; if (sym_index>240) do__syms(); 

tempdl = tempd2 - 240 

call =[gt] $do_syms 

*(pba + $sym_index) =uh tempd2 

; sym_arr ay [ sym_i ndex++] = sym; 

*(a0 +[x0]) =ub tempd4 
|| tt - tt »u 1 

endjloop: 
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xO = m 

firstloop: 

x8 = *(sp + i) 

done: 

al2 =sw *(sp + 4) 
a4 =sw *(sp + 16) 
br = *(sp + 12) 

d6 =sw *(sp + 8) 

[| d7 =sw *(sp + 0) 
dO - &*(sp ++= 32) 
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/******************************************************************* 

★ * 

** ztr.s (PP Program) 

★ * 

** Written by Jim Witham, Code 472300D, 939-3599 
★ * 

** File that contains the following assembly language subroutines: 

* * 

** $comp_ztr 
** 

********************************************************************/ 
.global $comp_ztr 

/******************************************************************* 


* * 

* Function : $comp__ztr * 

* Args : s * 

* Passed in : dl * 

* * 

* Description: * 

* s - the subblock to process * 

* * 

* $comp_ztr will be called to calculate the ztl array for a * 

* single subblock. * 

* * 

* Return Values: * 

* None * 

* ★ 


★★★★★★★A***********************************************************/ 

$comp_ztr: 

lctl = 0x0 ;reset looping capability 

d2 = dl « 9 
xO = d2 + Oxlfe 
nop 

aO = &*(dba + xO) 
bytes/word) 

d2 = dl « 7 
xO = d2 + 0xc7e 
nop 

a2 - &*(dba + xO) 
nop 

a8 — a 2 

/clear out ztl[s][0..63] array 
xO = d2 + OxOcOO 
nop 

al = &*(dba + xO) 

IrseO « 31 
dl = 0 
nop 

*(al++=[1])*» dl 

IrO = 3 /first innerloop 

Irl = 3 /second innerloop 

lr2 = 11 /outerloop 

leO = firstloopend2 
IsO - firstloopstart2 
lei = secondloopend2 
Isl = secondloopstart2 
le2 a* outerloopend2 
ls2 = outerloopstart2 


;stats_yal [s] [m-255] = stats_val + (s*256 + m) * (2 
/(Oxlfe, 0x3fe, 0x5fe, 0x7fe, 0x9fe, Oxbfe) 


/ztl[s][p=63] = ztl + (s*64 + p) *(2 bytes/word) 
/ (0xc7e, Oxcfe, 0xd7e, Oxdfe, 0xe7e, Oxefe) 
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nop 

lctl = 0xba9 ; associate leO with IcO, lei with lcl, and le2 with lc2 

nop 

nop 

d4 =sh *(aO—=[1]) 

outerloopstart2: 
nop 

firstloopstart2: 

d4 = | d4 | 

d4 = 31 - lmo(d4) 

II d2 =h * (a2) 
d4 = l«d4 

| | d3 =sh * (aO—= [1]) 

d3 = | d3 | 

d3 = 31 - lmo(d3) 

d3 = l«d3 

d2 - d2 | d4 | d3 
|| d4 =sh *(aO—=[1]) 

firstloopend2: 

*(a2--=[1]) =h d2 

a7 =uh * (a2++= [4]) 

secondloopstart2: 

d4 = | d4 | 

d4 = 31 - lmo(d4) 

I I d2 =h * (a2) 
d4 = l«d4 

|| d3 =sh *(a0--=[1]) 

d3 = |d3| 

d3 = 31 - lmo(d3) 

d3 = l«d3 

d2 = d2 | d4 | d3 
I I d4 =sh * (aO—= [1]) 

secondloopend2: 

* (a2—= [1]) =h d2 

outerloopend2: 
nop 

IrO = 1 /first inner loop 

lrl - 1 /second innerloop 

lr2 = 5 /outerloop 

leO = firstloopend3 
IsO = firstloopstart3 
lei = secondloopend3 
lsl = secondloopstart3 
le2 = outerloopend3 
ls2 = outerloopstart3 

nop 

nop 

outerloopstart3: 
nop 
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firstloopstart3: 

d4 = |d4 | 

I I d3 =h * (a8) 

<44 « 31 - lmo(d4) 
|| d2 =h * (a2) 
d4 = l«d4 
d3 = d3 | d4 
I| d4 =sh ★(aO—=[1]) 

d2 = d2 | d3 
I I * (a8—= [1]) =h d3 

d4 = |d4| 

I | *(a2) =h d2 

d4 = 31 - lmo(d4) 

II d3 =h *(a8) 
d4 = l«d4 

I I d2 =h * (a2) 
d3 = d3 | d4 
I | d4 =sh *(aO—=[1]) 

d2 = d2 | d3 
I } * (a8—= [1]) =h d3 


firstloopend3: 

* (a2—= [1]) =h d2 

a7 =uh * (a2++=[2]) 

secondloopstart3: 

d4 = | d4 | 

| I d3 =h * (a8) 

d4 « 31 - lmo(d4) 
|| d2 =h * (a2) 
d4 = l«d4 
d3 = d3 | d4 
I | d4 =sh *(aO—=[1]) 

d2 = d2 | d3 
| | * (a8—= [1]) =h d3 


d4 = | d4 [ 

| | *(a2) =h d2 

d4 = 31 - lmo(d4) 
|| d3 =h * (a8) 
d4 = l«d4 
|| d2 =h * (a2) 
d3 = d3 | d4 
|| d4 =sh *(aO—=[1]) 

d2 = d2 | d3 
| | * (a8—= [1]) =h d3 

secondloopend3: 

* (a2—= [1]) =h d2 


outerloopend3: 
nop 

IrO = 2 ;first loop 

leO = firstloopend4 
IsO = firstloopstart4 

nop 

no P 
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firstloopstart4: 

d4 = | d4 | 

|| d3 -h * (a8) 

d4 = 31 - lmo (d4) 
|| d2 =h * (a2) 
d4 = l«d4 
d3 = d3 | d4 
| | d4 =sh *(aO—=[1]) 

d2 = d2 | d3 
|| *(a8—«[1]) =h d3 

d4 = |d4| 

|| *(a2) =h d2 

d4 = 31 - lmo (d4) 
II d3 =h *<a8) 
d4 = l«d4 
II d2 =h * (a2) 
d3 = d3 | d4 
[ | d4 =sh * (aO—= [1]) 

d2 = d2 | d3 
I | * ( a 8—= [1]) =h d3 

* (a2) =h d2 

d4 = | d4 | 

| | d3 =h * (a8) 

d4 = 31 - lmo(d4) 

|| d2 =h *(a2) 
d4 = l«d4 
d3 = d3 | d4 
| | d4 =sh * (aO—= [1]) 

d2 = d2 | d3 
|| *(a8—=[1]) *h d3 

d4 - |d4 | 

I| *(a2) -h d2 

d4 = 31 - lmo(d4) 

|| d3 =h * (a8) 
d4 = l«d4 
I I d2 =h * (a2) 
d3 = d3 | d4 
| | d4 =sh * (aO—= [1J) 

d2 = d2 | d3 
| | * (a8—= [1]) =h d3 

* (a2—= [1]) =h d2 


firstloopend4: 
nop 

d4 = |d4| 

II d3 =h *(a8) 

d4 = 31 - lmo(d4) 

| | d2 =h * (a2) 
d4 = l«d4 
d3 = d3 | d4 
| | d4 =sh * (aO—= [1]) 

d2 = d2 | d3 
I I * (a8—•= [1]) =h d3 

d4 = |d4| 
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I * (a2) =h d2 
d4 = 31 - lmo(d4) 

I d3 =h * (a8) 
d4 = l«d4 
t d2 =h * (a2) 
d3 = d3 | d4 
I d4 =sh * (aO—= [1]) 

d2 * d2 | d3 
I * (a8—«[1]) =h d3 

*(a2) =h d2 

d4 = | d4 | 

I d3 =h * (a8) 
d4 = 31 - lmo(d4) 

I d2 =h * (a2) 
d4 = l«d4 
d3 = d3 | d4 
I d4 =sh * (aO— 535 [1]) 

d2 - d2 | d3 
I * (a8—= [1]) «h d3 

d4 = | d4 | 

| * (a2) =h d2 
d4 - 31 - lmo(d4) 

I d3 =h * (a8) 
d4 = l«d4 
| d2 —h * (a2) 
d3 = d3 | d4 

d2 = d2 | d3 
I * (a8) =h d3 

*(a2) =h d2 


br = iprs 

nop 

nop 
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INTRAFRAME DECODING 


;Written by Jim Witham 
; 3-27-97 
; addconv.s 

; To replace the following C Code that just took too much time... 

for(i=0;i<204 8;i++) 

; { 


; 

img[i] = img[i] 

+ oldmean 

r 

f 

pic[i] = img[i] 

; 

t 

if (img[i]>255) 


/ 

pic[i] = 255; 


/ 

if (img[i]<0) 


r 

/ 

pic[i] = 0; 

} 


t 

global $add_n_conv8 


$add_ 

n conv8: 



dl = dl«2 
d6 =rh dl 

;dl is mean that is passed in 
/replicate it to both words 


sr = Oxad 
xO = 1 
xl * 2 



dO = 0 

d4 = 0x04000400 

; 04 00»2 = 0100 


d5 = 0x03fc03fc 
a2 = &*(dba + 1) 

; 03fc»2 = OOff 


le2 = L48 
lrs2 = 1023 



al = (dba) 
aO = (dba) 



dl - *(al++-[xO]) 
dl =me dl + d6 
d3 = (dO & @mf) | 

(dl & ~@mf) ;if (val+mean<0) then d3 = 0 

else 

d3 = dl 



d2 =me dl - d4 
d3 = (d3 & @mf) 1 

(d5 & ~@mf) ;if (val+mean>0x400) then d3 

0x3f c 

else d3 = d3 



d3 = d3 »u 2 

; shift down to proper scaling 


* (a2++=[xl]) =b d3 
I | d2 -hi d3 
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L48: 

* (aO++=[xl]) =b d2 

sr = 0x36 

br = iprs 

nop 

nop 


B-79 


NAWCWD TP 8442 


/* DECLARATIONS FOR ARITHMETIC CODING AND DECODING */ 

/* Size of arithmetic code values */ 

#define Code_value_bits 9 /* Number of bits in a code value */ 

typedef short code_value; /* Type of arithmetic code value */ 

#define Top_value ( ( (long)l<<Code_value_bits)-1) /* Largest code val 

*/ 

/* Half and Quarter points in code value range */ 

#define First_qtr (Top_value/4+l) /* Points after first quarter */ 

#define Half (2*First_qtr) /* Points after first half */ 

#define Third_qtr (3*First_qtr) /* Points after third quarter */ 
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mpcl -s -g -c -i\pcic80\include mp.c 

mvplnk -x mp.obj num2.obj \nawcwip.obj \dsp_uti1.obj \ f rame__ra . ob j 
ppO.out pcic80.cmd -o mp.out -m mp.map 
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erase *.o 
ppcl -s -k main.c 
ppcl -s -k j_dec.c 
ppcl -s -k modlp.c 
ppcl -o2 form.c 
ppasm decbits.s 
ppasm hvdec.s 
ppasm addconv.s 
ppasm quick.s 
ppasm newardec.s 

mvplnk -x main.o hvdec.c decbits.o modlp.o addconv.o j_dec.o newardec.o 
quick.o form.o pcic80a.cmd -t runppO -o ppO.out -m ppO.map 
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Z***************************************************************************** 
* * 

** decbits.s (PP Program) 

* ★ 

** Written by Jim Witham, Code 472300D, 939-3599 
* * 

** File that contains the following assembly language subroutines: 

★ * 

** $new_dec_dbits_ri 
** $new_dec_dbits_nri 
** $fast_cache_syms 
** $new__decode_symbol 
** I_DIV_JW2 
** $new_update_model 
** $new_dec_dbits3_ri 
** $new_input_bit 
** $quick_clear 
* * 

*****************************************************************************/ 


.global $new_dec_dbits_ri 
.global $new_dec_dbits_nri 

sw * (xba + $THRESH) 
pointer *(xba + $stats_flag) 


uh * (xba + $TMA$K) 
sw *(xba + $BITE) 

.global $T_BYTES 
.global $byte_stream 
.global $STOP 

.global $bit_index 

.global $update_model2 

.global $list 
.global $list_index 


.global $THRESH 
.global $stats__flag 


/signed int 
/unsigned char 


.global $stats_apx 
.global $index_to_char 

.global $freq 
.global $No_of_symbols 
.global $char_to_index 


.global $TMASK 
.global $BITE 


/unsigned short 
/signed int 


; needed for cacheing of symbols 


.global $cache_syms 
.global $symbol_index 
.global $save_stop 

.global $pruned__children 

CACHE . set 1 

FRED2 . set 1 
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;* FUNCTION DEF: $new_dec_dbits_ri( m,cl,c3, s) * 

;* dl,d2,d3,d4 * 


tempdl 

. set dl 

tempd2 

. set d2 

tempd3 

.setd3 

tflg 

.setd4 

tmask 

.setd5 

symbol 

. set d6 

t 

.setd7 


.align 512 

$new__dec_dbits_ri: 

dO - &* (sp --= 40) 

*(sp + 16) =w iprs 

* (sp + 12) =w a4 
I j * (sp + 8) -w d6 

*(sp + 4) =w al2 
|| *(sp + 0) =w d7 


compute 

sym 

* (sp 

+ 

24) = dl 

* (sp 

+ 

28) - d2 

* (sp 

+ 

32) = d3 

* (sp 

+ 

36) = d4 


.if CACHE «« 0 

call = $new_decode_symbol 

nop 

nop 

x8 = d5 

al2 = &*(xba + $index_to_char) 
nop 

symbol =ub * (al2 + x8) ;dl = index_to_char[sym] 

call = $new_update_model 
nop 

dl = d5 
. else 

; symbol = save_symbol[symbol_index] 

; STOP = saverstop[symbol_index] 

; symbol_index = symbol_index + 1; 

; if (symbol_index==16) cache_syms(); 

x8 =ub * (xba + $symbol_index) 
al2 =uw *(xba + $save_symbol) 
all = &*{xba + $save_stop) 
symbol =ub * (al2 + x8) 


;get next symbol 
; get next STOP value 
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dl -ub * (all + x8) 

*(xba + $STOP) =uw dl 

x8 = x8 + 1 

*(xba + $symbol_index) =ub x8 

x8 = x8 - 16 ;call cache_syms if we need more symbols 

call =[eq] $cache_syms 

call =[eq] $fast_cache_syms 

nop 

nop 

.endif 

dl = *{sp + 24) 
d2 = *(sp + 28) 
d3 = Msp + 32) 
d4 = Msp + 36) 


al =uw 

* (xba 

+ $stats_apx) ;*(al) = stats apx[s*256] 

a2 =uw 

* (xba 

+ $stats_flag) 


d5 = d4 

« 9 



al = al 

+ d5 



d5 = d4 

« 8 



a2 - a2 

+ d5 



d5 = dl 




d5 = d5 

+ (d4 

« 8) 


* (sp + : 

20) =uh d5 


aO = a2 


;*(a0) 

= stats_flag[s*256] 

xO = dl 


/xO - m 

- 

xl = d3 


;Ma2 + [xl] ) 

= stats_flag[c3] 

x2 = d3 

+ 1 

;* (a2 + [x2] ) 

= stats_flag[c4 = c3 + 1] 

a2 = a2 

+ d2 

/*(a2) 

= stats flag[s*256 + cl] 



;*(a2 + [1]) = stats_flag[c2 = cl + 1] 

;* (al + 

[xO] ) 

= stats_apx[s*256 + m] 


; * (a 0 + 

txO] ) 

== stats_flag [s*256 + m] 


/* (a2) 


= stats flag[s*256 + cl] 


;Ma2 + 

[1]) 

= stats_flag[s*256 + c2] 

(c2 = cl + 1) 

;*(a2 + 

[xl] ) 

= stats_flag[s*256 + c3] 


;*(a2 + 

[x2] ) 

= stats__flag [s*256 + c4] 

(c4 = = c3 + 1) 


tempdl -sw *(xba + $BITE) 
tempdl = 1 << tempdl 

t = symbol - tempdl 
br =[le] sym_gtl 
nop 
nop 

tempdl = 3 

* (aO + [xO])=ub tempdl 

x8 =uh *(pba + $list_index) 
al5 = x8 - 254 
br = [ge] nosig 
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tempdl =uh Msp + 20) 

al2 =uw *(pba + $list) 

tempd2 = x8 + 1 

*(al2 + [x8]) =uh tempdl 

*(pba + $list_index) =uh tempd2 


nosig: 

; stats_apx[m] = ((t*THRESH)»(BITE-1) ) + 

tempdl =sw *(xba + $BITE) 
tempd2 =uh *(xba + $THRESH) 

tflg = 1 - tempdl 

tmask =u tempd2 * t 
tempdl = - tempdl 

tempd3 = (tmask »u -tflg) 

br = done 

tempd3 = tempd3 + (tempd2 >>u -tempdl) 
*(al + [xO]) =h tempd3 

sym_gtl: 

t = symbol - 1 
br =[le] sym__eq0 
/ nop 
nop 

tempdl = 1 

* (aO + [x0])-ub tempdl 

x8 =uh *(pba + $list_index) 

al5 = x8 - 254 

br = [ge] nosig2 

tempdl =uh *(sp + 20) 

al2 =uw *(pba + $list) 

tempd2 = x8 + 1 

*(al2 + [x8]) =uh tempdl 

*(pba + $list_index) =uh tempd2 

nosig2: 

; stats_apx[m] = ( (t*THRESH)»(BITE-1) ) + 

tempdl =sw *(xba + $BITE) 
tempd2 =uh *(xba + $THRESH) 

tflg = 1 - tempdl 

tmask =u tempd2 * t 
tempdl = - tempdl 

tempd3 = -(tmask >>u -tflg) 

br = done 

tempd3 = tempd3 - (tempd2 »u -tempdl) 

*(al + [xO]) -h tempd3 


(THRESH»BITE) 


(THRESH»BITE) 
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sym_eqO: 

t = symbol - 0 
br =[ne] done 
; nop 
; nop 

tempd2 =ub *(a2) 
tempd2 = tempd2 | 4 
*(a2) =ub tempd2 

tempd2 =ub *(a2 + [1]) 
tempd2 = tempd2 | 4 
* (a2 + [1]) =ub tempd2 

tempd2 =ub *{32 + [xl]) 
tempd2 = tempd2 I 4 
*(a2 + [xl]) =ub tempd2 

tempd2 =ub *(a2 + [x2]) 

tempd2 = tempd2 | 4 
*(a2 + [x2]) =ub tempd2 

aO = &*(pba + $pruned_children) 

xO = * (sp + 36) 

nop 

tempdl =ub *(aO + xO) 
tempdl = tempdl + 1 
*(a0 + xO) =ub tempdl 


done: 

al2 =w *(sp + 4) 
a4 =w * (sp + 12) 

br = *(sp + 16) 

d6 =sw *(sp + 8) 

|i d7 =sw *(sp + 0) 
dO = &* (sp ++= 40) 

; branch occurs here 


* FUNCTION DEF: $new_dec_dbits_nri( 111,01,03, s) * 

* dl,d2,d3,d4 * 


$new dec dbits nri: 


a2 


aw 

* (xba + 

$stats_flag) 



x8 

= 

d4 






d4 

= 

d4 

« 8 




a2 

= 

a2 

+ 

d4 




aO 

= 

a2 



; * (aO 

+ 

[xO]) = stats_flag[s*256 

xO 

SS 

dl 



;xO = 

m 


xl 

- 

d3 



;* (a2 

+ 

[xl]) = stats_flag[s*256 + c3] 

x2 

— 

d3 

+ 

1 

; * (a2 

+ 

[x2]) = stats_flag[c4 - c3 + 1] 

a2 


a2 

+ 

d2 

;Ma2) 


* stats_flag[s*256 + cl] 


;*(a2 + [1]) = stats_flag[c2 = cl + 1] 
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tempd2 =ub *(aO + [xO]) 

tempd2 = tempd2 & 251 
*{a0 + [xOj) =ub tempd2 

tempd2 =ub *(a2) 
tempd2 = tempd2 | 4 
*(a2) -ub tempd2 

•tempd2 =ub * (a2 + [1]) 
tempd2 = tempd2 | 4 
*(a2 + [1]) =ub tempd2 

tempd2 =ub *(a2 + [xl]) 
tempd2 = tempd2 | 4 

* (a2 + [xl]) =ub tempd2 

tempd2 =ub *(a2 + [x2]) 

a8 = &*(pba + $pruned_children) 
nop 

tempdl =ub *(a8 + x8) 
tempdl = tempdl + 1 
*(a8 + x8) =ub tempdl 

br = iprs 

tempd2 = tempd2 | 4 

* (a2 + [x2]) =ub tempd2 

.global $cum_freq 
.global $value 
.global $low 
.global $high 

.global $new__decode_symbol 
.global big_jump 

.align 512 

std ,setx2 

1 .setd6 

save_symbol .seta4 

.global $read_more 
.global $syms_to_do 

big_jump: 

*(pba + $syms_to_do) =uh std 

al5 = std - 0 
br = [eq] done_syms 

save_symbol =uw *(pba + $save_symbol) 
1 = 0 

leO = end_get - 8 
IrsO = std - 1 
nop 
nop 
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call = $new_decode_symbol 

nop 

nop 

xO - d5 

aO = &*(xba + $index_to_char) 
xl - 1 

tempdl =ub *(aO + xO) ;dl - index_to_char [sym] 

*(save_symbol + xl) =ub tempdl 
1 = 1 + 1 

call = $new_update_model 
nop 

dl = d5 

tempdl = * (pba + $STOP) 
tempdl = tempdl - 1 

IcO =[eq] 0 

nop 

nop 

nop 

end_get: 
nop 

tempdl = *(pba + $STOP) 
tempdl = tempdl - 1 

br =[ne] done_syms 

nop 

nop 


/ syms_to_do = 1 - symbol_index 


*(pba + $syms_to_do) =uh 1 
*(pba + $read_more) =ub al5 

done_syms: 


a4 = *(sp + 8) 

br = *( 

sp + 4) 

d6 =sw 

*(sp + 0) 

dO = &* 

(sp ++= 12) 

.align 

512 

.global 

$ fast_cache_syms 

.global 

$symbol_bit_index 

.global 

$save_value 
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m-k-k-k-te-k-k'kie'k'k'k'k'kie’k'k^-k'kieirif'k'k-k'kicieirif'ieieic'kie'k-k-k'k'k'k'kie'k’k'k'k'kie-k-kifir-^ie-^'kicicifieie'k 

;* FUNCTION DEF: $fast_cache_syms * 

.********+****************************************************** 


; The assembly language subroutine will realize the following C code 

;cache_syms() 

;{ 

/unsigned char i; 

/unsigned char sym; 

r 

; for (i=0;i<16;i++) 

; save_stop[i] = 1; 

; 

; for (i=0;i<16;i++) 

; { 

; symbol_bit_index[i] = bit_index; 

; save_value[i] = value; 

; save_high[i] = high; 

; save_low[i] = low; 

; sym = new_decode_symbol(); 

; save_symbol[i] = index_to_char[sym]; 

; new_update_model(sym); 

; save_stop[i] = STOP; 

; if (STOP~l) 

; break; 

} 

; 

; STOP = save_stop[0]; 

; symbol_bit_index[16] = bit_index; 

; save_value[16] = value; 

; save_high[16] - high; 

; save_low[16] = low; 

; symbol_index = 0; 

; 

;stophere = 1; 


;} 


$fast_cache_syms: 



dO = &* 

(sp 


12) 


* 

(sp + 

8) = 

: W 

iprs 


* 

(sp + 

4) = 

•w 

a4 



* (sp 

+ 0) 

=w 

d6 

a0 « 

& 

* (xba 

+ $save 

_stop; 

dl - 

0x01010101 



* (aO 

+ 

[0] ) 

=uw 

dl 


* (aO 

+ 

[1]) 

=uw 

dl 


* (aO 

+ 

[2] ) 

=uw 

dl 


* (aO 

+ 

[3] ) 

=uw 

dl 


leO = 


end for!6 



IrsO 

= 

15 





a4 =uw *(pba + $save_value) 
d6 = 0 

xO = d6 
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al = &*(pba + $symbol_bit_index) 
dl =uh * (pba +$bit_index) 

*(al + [xO]) =uh dl 

dl = *(pba +$value) 

*(a4 ++= 4) = dl 

dl = * (pba +$high) 

*(a4 + [16]) = dl 

call = $new_decode_symbol 
dl - *(pba + $low) 

* (a4 + [33] ) = dl 

xl = d5 

aO = &*(xba + $index_to_char) 
al = *(xba + $save_symbol) 
xO = d6 

tempdl =ub *(aO + xl) ;dl = index_to_char[sym] 

call = $new_update_model 
*(al + xO) =ub tempdl 
dl = d5 

xO = d6 

al = &*(pba + $save_stop) 
dl = * (pba +$STOP) 
d2 = dl - 1 
IcO -[eq] 0 
*(al + [xO]) =ub dl 

nop 

nop 

end_forl6: 

d6 = d6 + 1 

xO = d6 

dl =ub * (pba + $save_stop) 

*(pba + $STOP) = dl 

al = &* (pba + $symbol_bit_index) 
dl =uh * (pba +$bit_index) 

*(al + [xO]) =uh dl 

dl = * (pba +$value) 

*(a4 ++- 4) = dl 

dl - * (pba +$high) 

*(a4 + [16]) - dl 

dl = *(pba +$low) 

*(a4 + [33]) = dl 

* (pba + $symbol__index) =ub al5 
br = * (sp + 8) 
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d6 = *(sp + 0) 

II a4 * *(sp + 4) 

dO = &*(sp ++= 12) 


.global 

$cum_freq 

.global 

$value 

.global 

$low 

.global 

$high 

.global 

$new_input_bit 

. global 

I_DIV_JW2 

.global 

$new_decode_symbol 


.global $symbolspread 
.align 512 


range .setdO 

sym .setx9 

cum .setd5 

tempdl . setdl 

tempd2 .setd2 

high ,setd5 

low .setd6 

value .setd7 

cum_freq . set aO 
cum_freq__0 .setx2 


f 

;* FUNCTION DEF: $new_decode_symbol * 

.*************************************************************** 

r 

$new_decode_symbol: 

al5 = &*(sp —= 12) 

* (sp + 0) =w d7 
*(sp + 4) =w d6 

; delete me 

tempdl =uh * (xba +$symbols_read) 
tempdl = tempdl + 1 

* (xba + $symbols_read) =uh tempdl 


value =sw * (xba + $value) 
high -sw * (xba + $high) 
low —sw * (xba + $low) 

cum_freq = &*(xba + $cum_freq) 

range - high - low 
I| *(sp + 8) =w iprs 

cum = value - low 

range - range + 1 

cum = cum + 1 

| | tempdl =sh *(cum_freq) 
cum = cum * tempdl 
I i cum_freq_0 = tempdl 
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call = I_DIV_JW2 
tempdl = cum - 1 
tempd2 = range 
sym = 0 

iloop: 

tempdl =sh *++cum_freq 
tempdl = tempdl - cum 
br =[gtj iloop 
sym = sym + 1 
nop 

high =sh *(cum_freq - [1]) 
call * I_DIV__JW2 
tempdl = high * range 
tempd2 = cum_freq_0 

high = high - 1 

low = high + low ;save high in low 

tempdl =sh *(cum_freq) 
call « I_DIV_JW2 
tempdl - tempdl * range 
tempd2 = cum_freq_0 

high = low 
I | low == high 

tempdl =sw * (xba + $low) 
low = low + tempdl 

for_ever: 

al5 = high - (1\\8) 
br =[lt] done_if 
nop 
nop 

al5 - low - (1\\8) 
br =[lt] else2 
nop 
nop 

value = value - (1\\8) 
br = done_if 
low = low - (1\\8) 
high = high - (1\\8) 

else2: 

tempdl = 0 

a!5 - low - (1\\7) 

tempdl == [ It ] 1 

al5 - high - 384 

tempdl =[ge] 1 

tempdl = tempdl - 0 

br =*[ne] done_for 

nop 

nop 

value = value - (1\\7) 
low = low - (1\\7) 
high - high - (1\\7) 
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done_if: 

low = low << 1 
high = high << 1 
high = high + 1 
call = $new_input_bit 
xl = d5 

value = value << 1 

br = for_ever 
value = value + d5 
d5 = xl 

done_for: 

*(xba + $value) =sw value 
*(xba + $high) =sw high 
*(xba + $low) =sw low 

d5 = x9 

br = *(sp + 8) 

d7 = *(sp + 0) 

II d6 = *(sp + 4) 

dO = &*(sp ++= 12) 


.global I_DIV_JW2 

.*********★**********************************************■***************** 

t 

;* I_DIV.ASM vl.10 - Integer Divide 

;* Copyright (c) 1993-1995 Texas Instruments Incorporated 


+-+ 

| i_div.asm = PP assembly program that is used to return a 32-bit | 

I signed integer quotient from 32-bit signed integer I 

j division when called by a C program. I 

I i 

+-+ 


+ - + 

I 32-bit Signed Integer Word Divide Subroutine : I 
I o Input 32-bit signed integer Operand 1 is in dl (numerator). I 
| o Input 32-bit signed integer Operand 2 is in d2 (divisor). I 
j o Output 32-bit signed integer is in d5 (Answer = quotient). I 
| o Output 32-bit signed remainder is discarded. I 
| o 0 input divisor produces 0x80000000 output with overflow set. I 
| o Quotient = 0x80000000 sets overflow. I 
| o Number of Stack Words used - 3. I 
| o MF register is saved. I 
I 1 
| o NOTE: Loop Counter 2 Registers are used but NOT restored ! I 
+-<■ 


+ - + 

1 32 bit / 32 bit ===> 32 bit signed quotient I 

| Signed PP Integer Division I 

| Numerator / Denominator = Quotient + Remainder (discarded) I 

| Divide by 0 produces 80000000 and sets sr(V) I 

| Divide Overflow is not possible if Divisor is non-zero, I 

| except 80000000/ffffffff « 80000000 will set sr(V). I 

I MF register is preserved. I 

--+ 
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.ptext 


PP assembly code 


argl: 
arg2: 
ans: . 
Div: . 
Num: . 
Tmp: . 


.setdl ; input argument 1 = Numerator (32 low bits) 

.setd2 ; input argument 2 = Divisor (32 bits) 

setd5 ; answer = 32 bit signed quotient 
setd3 ; Input Divisor 
setd4 ; Input high Numerator = 0 

setd5 ; ALU output for each DIVI 


I DIV JW2: 


Signed Word Integer Divide: Ans = Opl / 0p2 


Div =0-| arg2 I 
|| *(sp-=[3]) = Div 
br = [z] Div_By_0 
Num = 0 

I | * (sp+[1]) * mf 
|| *(sp+[2]) = Num 
mf = | argl | 


, negate | divisor | 

; || push Div 

; Divide By 0 ? 

high numerator = 0 

; || push mf 

; | | push Num 

; input lo | numerator I 


lrse2 = 29 / loop count - 1 

Tmp = divi(Div / Num=Num) ; 1-st divide iterate 

Tmp = divi(Div, Num=Tmp [n] Num ) ; 2-nd divide iterate 

LoopSW: Tmp = divi(Div, Num=Tmp [n] Num ) ; divide iterate 3-32 


ans = mf 

I| Div = *sp++ 
Num = argl A arg2 
I| br = iprs 
ans =[n] -ans 
I| mf = *sp++ 
Num = * sp++ 

Div_By_0: 

Div_Ovf1: 

br = iprs 

|| Div = *sp++ 
mf = *sp++ 
ans = 0 - 1«31 
|[ Num = *sp++ 

tempd4 .setd4 
tempd5 .setdS 


; | ans I = mf 
; | | pop Div 

; quotient sign 
; [ | return 

; quotient is negative, 

; | | pop mf 

; pop Num 

; Divide By 0 \_ Optional Error 

; Divide Overflow / Return Code 

; return 

; || pop Div 

; pop mf 

; returns 0x80000000, sets sr(V) 
; || pop Num ...[END] 


f red . set dO 


.global $new_update_model 

.it***************************************************************** 

;* FUNCTION DEF: $new_update_model * 

******************************************************************* 

f 

$new__update_model: 

aO = &*(xba + $cum_freq) 
tempd3 = 0 

tempd2 =sh *a0 
tempd2 = tempd2 - 75 
tempd3 =[ne] 1 
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tempd2 =sw *(xba + $No_of_symbols) 

al5 = tempd2 - 0 

tempd3 =[lt] 1 

tempd3 = tempd3 - 0 

br =[ne] no_max 

a2 = dl 

lei = no_max - 8 

tempd5 = tempd2 << 1 
II tempd4 = &*(xba + $freq) 
aO = aO + tempd5 

lrsl = tempd2 

a8 = tempd5 + tempd4 

tempd3 = 0 

tempd2 =sh *a8 
tempd2 « tempd2 + 1 
|| *a0— =h tempd3 

tempd2 = tempd2 >>s 1 
*a8— tempd2 

I| tempd3 = tempd3 + tempd2 

no_max: 

d3 = &*(xba + $freq) 
fred = tempdl << 1 
aO = fred + d3 
a8 = aO - 2 

nop 


d4 =sh *a8 
| | d3 =sh *a0 

d3 - d3 - d4 
br =[ne] nLll 
le2 = end_while - 8 
nop 

freq_eq: 

fred = fred - 2 
| | d3 =sh *—a8 

a2 = a2 - 1 
| | d4 =sh *--a0 
d3 = d4 - d3 
br =[eq] freq_eq 
d2 =g a2 
al5 - d2 - dl 


br =[ge] nLll 

xO = &*(xba + $index_to_char) 
x8 = &*(xba + $char_to_index) 


al = dl 

a8 =ub *(a2 + xO) 
a9 =ub *(al + xO) 


* (a2 

+ 

xO) =b 

a9 

* (al 

+ 

xO) =b 

a8 

* (a8 


x8) =b 

al 

* (a9 

+ 

x8) -b 

a2 
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nLl 1: 

al5 = a2 - 0 

br = [le] end_while 
d2 = a2 - 1 

dl = &*(xba + $cum_freq) 

lrs2 = d2 
al - fred + dl 

nop 

fred =sh *—al 
fred = fred + 1 
*al =h fred 

end_while: 

br = iprs 

I| d3 =sh *a0 

d3 = d3 + 1 

*a0 -h d3 

; branch occurs here 

.global $new_input_bit 

/ 

;* FUNCTION DEF: $new_input_bit * 


; Need to check for STOP in routine that calls this one!!! 


$new_input_bit: 

d2 =uh *(xba + $bit_index) 
d4 = d2 >>u 3 

| | d'3 =uw * (xba + $byte_stream) 

aO - d4 + d3 

dO =uh *(xba + $T_BYTES) 

II dl - d2 + 1 

d4 = -(d2&7) 

|| * d3 -ub *a0 

d3 - (d3 »u -d4) 

*(xba + $bit_index) =h dl 
II d5 = d3 & 1 

dl = dl - (dO « 3) 

br =g iprs 

dl = 1 || dl =[lt] al5 /calculate new STOP value 

*(xba + $STOP) =w dl 


.align 512 

.global $new_dec_dbits3__ri 
.global $save_symbol 
.global $save_index 
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r 

;* FUNCTION DEF: $new__dec_dbits3_ri ( m) * 

; * dl * 


$new_dec_dbits3_ri: 

dO = &*(sp —= 40) 

*(sp + 16) —w iprs 

*(sp + 12) =w a4 

|| *(sp + 8) =w d6 

*{sp + 4) ~w al2 

|| *(sp + 0) =w d7 


;fetch symbol from save_symbol[save_index++] 

xO =uh *(pba + $save_index) 
xl = xO + 1 

aO =uw *(pba + $save_symbol) 

*(pba + $save_index) =uh xl 

symbol =ub *(aO + xO) 

al =uw *(xba + $stats_apx) ;*(al) = stats_apx[s*768] 

aO =uw *{xba + $stats_flag) 

xO = dl ;x0 ~ m 

;*(al + [xO]) = stats_apx[s*768 + m] 

/*(aO + [xO]) = stats_flag[s*768 + m] 


tempdl =sw * (xba + $BITE) 
tempdl = 1 << tempdl 

t = symbol - tempdl 
br =[le] sym_gtlad 
; nop 
nop 

tempdl = 3 

*(a0 + [xO])=ub tempdl 

x8 =uh *(pba + $list_index) 
al5 *= x8 - 254 
br = [ge] nosig3d 
tempdl = xO 

al2 =uw *(pba + $list) 

tempd2 - x8 + 1 

*(al2 + [x8]) «uh tempdl 

*(pba + $list_index) =uh tempd2 

nosig3d: 

; stats_apx [m] « ( (t*THRESH) » (BITE-1) ) + (THRESH»BITE) ; 

tempdl =sw *(xba + $BITE) 
tempd2 -uh *(xba + $THRESH) 
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tflg = 1 - tempdl 

tmask =u tempd2 * t 
tempdl = - tempdl 

tempd3 = (tmask »u -tflg) 

br = donead 

tempd3 - tempd3 + (tempd2 »u -tempdl) 
* (al + [xO]) =h tempd3 

sym_gtlad : 

t = symbol - 1 
br =[le] donead 
nop 
/ nop 

tempdl « 1 

*(aO + [xO])=ub tempdl 

x8 =uh *(pba + $list_index) 

al5 - x8 - 254 

br =[ge] nosig4d 

tempdl = xO 

al2 =uw *(pba + $list) 

tempd2 - x8 + 1 

*(al2 + [x8]) =uh tempdl 

*(pba + $list_index) =uh tempd2 

nosig4d: 

; stats_apx[m] = - ( (t*THRESH)»(BITE-1)) 

tempdl =sw *(xba + $BITE) 

tempd2 =uh *(xba + $THRESH) 

tflg = 1 - tempdl 

tmask =u tempd2 * t 

tempdl = - tempdl 

tempd3 = -(tmask >>u -tflg) 

tempd3 = tempd3 - (tempd2 >>u -tempdl) 

*(al + [xO 3) =h tempd3 

donead: 

al2 =w *(sp + 4) 
a4 -w *(sp + 12) 

br = *(sp + 16) 

d6 =sw * (sp + 8) 

|| d7 =sw *(sp + 0) 
dO = &*(sp ++= 40) 

; branch occurs here 

.global $quick_clear 


(THRESH»BITE) ; 
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$quick_clear: 

aO =uw * (xba + $stats_apx) 
IrseO - 767 

a8 =uw * (xba + $stats_flag) 
dl * 0 

* (aO ++=[1]) = dl 
It *(a8 ++=[1]) =uh dl 


br = iprs 

nop 

nop 
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/★a***************************************************************** 

* * 

** faster_ar.c (PP Program) 

* * 

** Written by Jim Witham, Code 472300D, 939-3599 
* * 

** Contains fast_arith_decode2 and the do_subdec subroutine. 

* * 

********************************************************************y 

#include <mvp.h> 

#include "arith.h" 

#include "modlp.h" 

#define CACHE 
#define ASSEM 

extern unsigned char emubrk; 

extern unsigned char *stats_flag; 

extern int STOP; 

extern unsigned short syms_to_do; 
extern unsigned short *save_array; 
extern unsigned char read_more; 
extern unsigned short left_off; 
extern int FIRST_DEC; 
extern unsigned char RB_DEC; 
extern unsigned char PASS_DEC; 
extern unsigned short THRESH; 
extern unsigned short *list; 
extern unsigned short list_index; 

extern unsigned char index_to_char[Max_No_of_symbols+l] ; 

extern short *stats_apx; 

extern unsigned short symbols_read; 

extern int BITE; 

extern unsigned char pruned_children[8]; 

extern unsigned char total_links; 

extern unsigned char link_list[6]; 

extern unsigned short bit_index; 

extern int value; /* Currently-seen code value */ 

extern int low, high; /* Ends of current code region */ 

extern int *save_value; 
extern int *save_high; 
extern int *save_low; 

extern unsigned short symbol_bit__index [ 17]; /* bit_index value after each 

symbol has been read into the cache */ 

extern unsigned char save_stop[16]; /* STOP value after each symbol has 

been read into the cache */ 

extern unsigned char symbol_index; /* index to next symbol in the cache 

*/ 

void do subdec () ; 
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/* */ 

/* fast_arith_decode2 will be compiled with the -o2 (optimization full) */ 
/* command from the compiler to optimize this code for fast operation. */ 
/* */ 

/* The routine scans thru the zero-tree and turns symbols into */ 

/* coefficients where needed (pruning trees as we go...) */ 


/* */ 


void -fast_arith_decode2() 
{ 

int i,m,c,s; 

tifdef CACHE 

fast_cache_syms() ; 
#endif 


/* DECODE SIGNIFICANCE MAP */ 

for (s=0;s<6;s++) pruned_children[s] = 0; 

for (s=0;s<6;s++) 
dec dbits_fix(s); 


m - 0; 

for (s=0;s<6/s++) 

{ 

if (pruned_children[s] < 1) 

{ 

link_list[m++] = s; 
pruned__children [s] = 0; 

} 

else 

{ 

pruned_children[s] = 255; 

for (i=(s*256)+1;i< ( (s*256)+4);i++) 

{ 

stats_flag[i] = stats_flag[i] & 251; 

} 

} 


} 

total_links = m; 

for(m=l;m<4;m++) 
{ 

c=4*m; 


for (i=0;i<total_links;i++) 

{ 

s = link_list[i]; 

#ifdef ASSEM 

if ( (stats_flag[m + s*256]&4)==4) 
new_dec_dbits_nri(m f c, 2, s); 
else if ( (stats_flag[m + s*256]&5)—0) 
new_dec_dbits_ri (m, c, 2, s) ; 

#else 

dec_dbits (m, c, c+1, c+2, c+3, s) ; 

#endif 
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if (ST0P==1) 
break; 

} 

if (ST0P==1) 
break; 

} 

m = 0; 

for (s=0;s<6;s++) 

{ 

if (pruned_children[s] < 3 ) 

{ 

link_list[m++] = s; 
pruned_children[s] = 0; 

} 

else 

{ 

if (pruned_children[s] != 255) 

{ 

for (i=(s*256+4);i<(s*256+16);i++) 

{ 

stats_flag[i] = stats_flag[i] & 251; 

} 

} 

pruned_children[s] = 255; 

} 


total links = m; 


for(m-4;m<16;m++) 

{ 

c = 8 *(m/2) + 2*(m%2); 

for (i=0;i<total_links;i++) 

{ 

s = link — list[i]; 

#ifdef ASSEM 

if ( (stats_flag[m + 3*256]&4)==4) 
new_dec_dbits_nri(m,c,4,s); 
else if ((stats_flag[m + s*256]&5)==0) 
new_dec_dbits_ri(m, c, 4, s); 

#else 

dec_dbits(m,c,c+1,c+4,c+5,s); 

#endif 

if (STOP==l) 
break; 

} 


if (STOP“l) 
break; 

} 


m = 0; 

for (s=0;s<6;s++) 

{ 

if (pruned_children[s] < 12 ) 

{ 

link_list[m++] ~ s; 
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pruned_children[s] = 0; 

} 

else 

{ 

if (pruned__children[s] !- 255) 

{ 

for (i=(s*256+16);i<(s*256+64);i++) 

{ 

stats_flag[i] = stats_flag[i] & 251; 
} 

} 

pruned_children[s 3 = 255; 

} 


total links = m; 


for(m=16;m<64;m++) 

{ 

c = 16*(m/4) + 2*(m% 4); 

for (i=0;i<total_links;i++) 

{ 

s = link_list[i]; 
fifdef ASSEM 

if ((stats_flag[m + s*256]&4)==4) 
new_dec_dbits__nri (m, c, 8, s) ; 
else if ((stats_flag[m + s*256]&5)==0) 
new_dec_dbits_ri (rn, c, 8,s) ; 

#else 

dec_dbits(m,c, c+1,c+8 , c+9, s); 

#endif 

if (STOP==l) 
break; 

} 

if (STOP"l) 
break; 

} 

/* Eliminate if NSCALES =3 */ 

m = 0; 

for (s=0;s<6;s++) 

{ 

if (pruned_children[s] < 48 ) 

{ 

link_list[m++] = s; 
pruned_children[s] = 0; 

} 

else 

{ 

if (pruned_children[s] != 255) 

{ 

for (i=(s*256+64);i<(s*256+256);i++) 

{ 

stats_flag [i] - stats__f lag [i] & 251; 

} 

} 

pruned_children[s] = 255; 


B-104 



NAWCWD TP 8442 


} 

} . 

total__links = m; 
emubrk = 2; 

if ((STOP==0) && (total_links!=0)) 

{ 

read_more = 0; 
left_off = 0; 

pre_dbits2(); 

while (syms_to_do>0) 

{ 

for (i=0;i<syms_to_do;i++) 

{ 

m = save_array[i] ; 
new_dec_dbits3_ri(m); 

} 

syms_to_do = 0; 

if (read_more == 1) 

{ 

pre_dbits2{); 

} 

} 

} 

else 

{ 

if (total_links " 0) 

{ 

bit_index = symbol_bit_index[symbol_index] ; 
value « save_value[symbol_index]; 
high = save_high[symbol_index]; 
low - save_low[symbol_index]; 

} 

} 

emubrk = 3; 
do_subdec(); 

} 

void do_subdec() 

{ 

int t,tt,sym,symbol,i,m,j; 

/* restore pointer to start getting new symbols for a different symbol size */ 

start_model(2); 

if (STOP !- 1) 

PASS DEC+=BITE; 
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t - THRESH»BITE; 

THRESH = THRESH»BITE; 

if (PASS__DEC==BITE) 

BITE = RB_DEC; 
else 

BITE = 1; 

#ifdef NO_MULTI_BITPLANE 
BITE = 1/ 

#endif 

/* DECODE RESOLUTION INCREASE */ 

if {STOP==0) 

{ 

for (i=0; i<list__index; i++) 

{ 

m = list [i]; 
tt - t; 

for(j=0;j<BITE;j++) 

{ 

sym = new_decode_symbol{); 
symbol = index_to_char[sym]; 
new_update_model(sym); 

stats_apx[m] += (( (stats_flag[m]&2)>0) ? 1:-1)*(((l&symbol)>0) ? 1 
l)*((tt+l)/2); 

tt = tt/2; 
if (STOP==l) 
break; 

} 

if (STOP==l) 
break; 

} 

} 

start model (1« (BITE+1)) ; 
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y******************************************************************* 

* * 

* Function : form_img * 

* Args : s - subblock number (0-5) * 

* p - pair number (0,1) * 

* * 

* Description: * 

* form_img will be called to copy the coefficients from one * 

* section of memory in zero-tree format to another place * 

* in memory in in-place format * 

+ * * 

* Return Values: * 

* None * 

* * 

*★*****************************************************************/ 

#define NSCALES 5 

extern short *coeff_block; 
extern short *stats_apx; 

/★****•**■***★* + *•***★***** + *★★*************★***★**•***********★*****★**★★* 

FORM_IMG 

*******************************+******************x******************y 

void form_img2(s, p) 
int s,p; 

{ 

int k, j,1,i,ty,tx; 
i = s * 256; 

for(k=NSCALES-2;k>0;k—) 

{ 

if (k==(NSCALES-2) ) 

for (j =0; j < (1 << (NSCALES-1)); j+=(l«k) ) 
for (1=0; 1< (1«NSCALES) ;l+=(2«k) ) 

coeff_block[p*512+j*32+1] = stats_apx[i++]; 

for (j=0; j< (1<< (NSCALES-1) ) ;j+=(l«k) ) 
for (l-(l«k) ;1< <1<<NSCALES) ;l+=(2«k) ) 

coeff_block[p*512+j*32+l] = stats_apx[i++]; 

for (j= (1« (k—1) ) ; j < (1<< (NSCALES-1)) ; j+=(l«k) ) 
for (1=0;1<(1«NSCALES) ;l+=(2«k)) 

coeff_block[p*512+j*32+l] = stats_apx[i++]; 

for (j= (l«(k-l) ) ; j<(l« (NSCALES-1)) ; j+=(l«k) ) 
for (l=(l«k) ;1< <1<<NSCALES) ;l+-(2«k) ) 

coeff_block[p*512+j*32+1] = stats_apx[i++]; 

} 

for (j=0; j<(l« (NSCALES-1)) ; j++) 
for (1=1;K<1<<NSCALES) ;l+=2) 
coeff_block(p*512+j*32+1] = 0; 
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/***********************★**********************************★****************** 
★ * 

** hvdec.s (PP Program) 

★ * 

** Written by Jim Witham, Code 47230QD, 939-3599 
* * 

** File that contains the following assembly language subroutines: 

* * 

** $subdec_vert 

** $subdec_horiz 
* * 

*****+******+****+*+*****************+**+***+*************************+******/ 


.global $subsyn_vert 
.global $subsyn_horiz 

.align 2048 


★ * 


Function 

Args 

Passed in 


$subsyn_horiz 
outerloop,innerloop 
(dl ,d2) 


* 

* 

* 

* 


* Description: 

* outerloop - the width of the coefficient patch 

* 


* 

★ 

* 


* innerloop - the height of the coefficient patch 

* 


★ 

* 


$subsyn_horiz will be called to perform the wavelet 
composition in the horizontal direction. 


★ * 

* Return Values: * 

* None * 

* * 


*****★**★★★*************★**********★*************★********★*■*■■******/ 


$subsyn_horiz: 


lctl = 0x0 ;reset looping capability 


;set loop reload, counter 
IrO = dl - 1 
Irl = d2 - 2 


a4 = d2 ; innerloop 

d7 « 0 ; k = 0 

Set up loop to iterate ((128 » scale) - 2) times 
lei = InnerLoopEnd2 ; 

lsl = InnerLoop2 ; 

leO = OuterLoopEnd2 ; 

IsO = OuterLoop2 


nop 

lctl = 0xa9 ;associate leO with IcO and lel with lcl 


nop 

nop 
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OuterLoop2: 

d3 = a4 
d3 = d3 * d7 
d3 * d3 « 1 
xO = d3 
nop 

aO =h &* (dba + [xO] ) 
nop 

;>>>> new img[k][0] = (img [k] [0]-img [k] [index] ) »1; 

; [aO] [aO] [aO+1] 

d2 =sh MaO) 
d4 =sh *(aO + [ 1]) 
d3 = d2 - d4 
d3 = (d3 » 1) 

*(aO) -h d3 

nop 

InnerLoop2: 

;save first stored variable (d3?) 

/img[k][2*l+index] -= ((img[k][2*1]+img[k][2*l+2*index])»1); 

;img[k][2*l] = (img[k] [2*1]»1) - ( (img[k] [2*l+index] + img[k] [2*l-index])>>2) 

d3 =sh *(aO + [1]) 
d4 =sh *(aO + [3]) 

d2 =sh *(aO + [2]) 

II d5 - d3 + d4 

d2 = d2 » 1 

d2 = d2 - (d5 »s 2) 

*(aO + [2]) =h d2 

;img[k][2*l] = (img [k] [2*1] «1) + ((img [k] [2*l+index] +img [k] [2*l-index]) »1) ; 
;img[k][2*l-index] += ((img[k][2*1] + img[k][2*l-2*index])»1); 

d4 =sh *(a0++=[1]) 

d5 - d2 + d4 

d3 = d3 + (d5 >>s 1) 

*(aO++=[l]) “h d3 

InnerLoopEnd2: 

nop 


;>>>> img[k][XSIZE-index] -» img[k][XSIZE-2*index]; old 

;>»> img[k] [XSIZE-index] += img[k] [XSIZE-2*index] ; new 

d2 =sh *(aO ) 
d3 =sh *(aO + [1] ) 
d2 = d2 + d3 
*(aO + [1]) =h d2 

OuterLoopEnd2: 

d7 = d7 + 1 

br = iprs 

nop 

nop 
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* * 


* Function : $subsyn_vert * 

* Args : outerloop,innerloop * 

* Passed in : (dl ,d2) * 

* * 


* Description: 

* outerloop - the width of the coefficient patch 

* 


★ 

* 

★ 


innerloop - the height of the coefficient patch 


* 

★ 


* $subsyn_vert will be called to perforin the wavelet 

* composition in the vertical direction. 

* 


* 

* 

* 


* Return Values: 

* None 

* 


* 

* 

* 




$subsyn_vert: 

lctl = 0x0 ;reset looping capability 


IrO = dl - 1 
lrl = d2 - 2 
a4 - dl 


;outerloop 
;innerloop 


d7 = 0 


; k = 0 


Set up zero-overhead loops 
lei = InnerLoopEnd 
lsl = InnerLoop ; 

leO = OuterLoopEnd 
IsO = OuterLoop 
nop 

lctl = 0xa9 /associate leO with IcO and lei with lcl 


d3 

= a4 


d4 

- d3 

+ d3 

xl 

= d4 


d4 

= d4 

+ d3 

x2 

= d4 



OuterLoop: 

;>>>> img[index][k] 

((img[0][k]+img[2*index][k])>>2); 


nop 

xO = d7 
nop 

aO =h &*(dba + [xO]) 
nop 

xO = a4 


;>>>> img[0][k] += img[index][k]; 

;>>>> img[0][k] -- img[index] [k] ; 

d3 ~h * (aO) 
d4 =h *(aO + [xO]) 
d3 = d3 - d4 
*(aO) =h d3 


(img[index] [k] > >1) 
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nop 

InnerLoop: 

;img[2*l+index][k] = (img[2*l+index][k]> 

{ (img [2*1] [k] +img [2*1+2 * index] [k] ) »2) ; 

;img[2*l][k] -= ( (img[2*l + index] [k] + img[2*l-index] [k])»1); 

d3 =sh *(aO + (xO] ) 
d4 =sh *(aO + [x2]) 

d2 =sh *(aO + [xl]) 

|| d5 = d3 + d4 

d2 = d2 - (d5 » 1) 

*(aO + [xl]) =h d2 

;img[2*1][k] +« ((img[2*l+index][k]+img[2*l-index][k]+2)>>1); 

;img[2* 1-index] [ k] - (img[2*l-index] [k]<<1) + ((img[2 * 1 ] [k] 

2*index][k])>>1); 

d4 =sh *(aO++=[xO]) 
d5 = d2 + d4 
d3 - d3 « 1 
d3 = d3 + (d5 » 1) 

*(a0++=[xO]) =h d3 

InnerLoopEnd: 

nop 


;>>>> img[YSIZE-index] [k] = (img[YSIZE-index] [k]- 

2*index].[ k] ) »1; 

;>>>> img[YSIZE-index] [k] = (img [YSIZE-index] [k]<<1) + 

2*index][k]; 

d2 =sh *(aO) 
d3 =sh *(aO + [xO]) 
d3 = d3 « 1 
d2 - d2 + d3 
*(aO + [xO]) =h d2 

OuterLoopEnd: 

d7 = d7 + 1 

br = iprs 

nop 

nop 


> 1 ) 


+ img[2*1- 


img[YSIZE- 
img[YSIZE- 
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tinclude <mvp.h> 

#include "arith.h" 

#include "modlp.h" 

/* #define NO_MULTIJB ITPLANE */ 

tdefine CACHE 

#define ASSEM 

#define XSIZE 512 
#define YSIZE 256 
#define NSCALES 5 
#define AX 16 /* = XSIZE/BS */ 

#define AY 8 /* - YSI&E/BS */ 

#define BS 32 /* = 2 A NSCALES */ 

/* REAL-TIME: 2 Zerotrees/partition, last subband truncated * 
int stophere; 

extern shared int global_maxval; 

extern short *coeff_block; 
extern short *stats_apx; 
extern unsigned char *stats_flag; 
extern unsigned char *byte_stream; 

extern char *tbuf; 

extern unsigned short *list; 
unsigned short list_index; 

short BYTE_TOTAL; 

extern int No_of_chars; 

extern int EOF_symbol; /* Index of EOF symbol */ 

extern int No_of_symbols; /* total # of symbols */ 

/* TRANSLATION TABLES BETWEEN CHARS AND SYMBOL INDEXES */ 

extern unsigned char char_to_index[Max_No_of_chars] ; 
extern unsigned char index_to_char[Max_No_of_symbols+l] ; 

extern short int cum_freq[Max_No_of_symbols + l] ; 

extern int buffer_index; /* JCW */ 

extern unsigned short T_BYTES; 

extern int STOP; 
extern int BYTE_CNT; 
extern unsigned short THRESH; 
extern int BITE; 
extern int SIG_COEF; 

/* extern int SIG[15]; */ 

extern short maxval; 

extern unsigned short bit_index; 
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extern int FIRST_DEC; 
unsigned char RB_DEC; 
unsigned char PASS_DEC; 
extern unsigned short *AL>LOC; 
void start_decoding(); 

void new_dec_dbits_ri (int m, int cl, int c3, int s); 
void new_dec_dbits_nri(int m, int cl, int c3, int s)/ 

void form_img2(int s, int p); 

/* The bit buffer */ 

unsigned char buffer; /* Bits buffered for output */ 

unsigned char bits_to_go; /* # bits free in buffer */ 

/* Current state of the decoding */ 

int value; /* Currently-seen code value */ 

int low, high; /* Ends of current code region */ 

unsigned char emubrk; 

unsigned char *pp_stop_encode = (unsigned char *) 0x010007D4; 

unsigned char *save_symbol; /* cache for symbols */ 

unsigned short save_index; 
unsigned short syms_jto_do; 

unsigned short *save_array; 
unsigned char read_more; 
unsigned short left_off; 

int *save_value; 
int *save_high; 
int *save_low; 

unsigned short symbols_read; 
unsigned char pass_num; 

/* needed for cacheing of symbols */ 

unsigned short symbol_bit_index[17]; /* bit_index value after each symbol has 

been read into the cache */ 

unsigned char save_stop[16]; /* STOP value after each symbol has been read 

into the cache */ 

unsigned char symbol_index; /* index to next symbol in the cache */ 

unsigned char pruned_children[8]; 
unsigned char total_links; 
unsigned char link_list[6]; 
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j * **■■****■**•***★■*★***★★★■***★***★*★**★*★***★****•***★★★*•**★*******★*’*★**•★★*★ 

DEC_DBITS 

void dec_dbits_fix(s) 
int s; 

{ 

int sym,symbol,t; 
int index; 

index- = (s*256); 

if ( ( (stats__flag [index] &1) ==0) && ( (stats_f lag [index] &4) =~0) ) 

. { 


/* delete me */ 
symbols_read += 1; 

#ifdef CACHE 

STOP = save_stop[symbol_index]; 
symbol = save_symbol[symbol_index++]; 

#else 

sym = new_decode_symbol(); 
symbol = index_to_char[sym]; 
new_update_model(sym); 

#endif 


if (symbol > (1«BITE) ) 

{ 

stats_flag[index] = 3; 
if (list_index < 254) 

list[list_index++] = index; 
t = symbol-(1<<BITE); 

stats_apx [index] - ( (t*THRESH) >> (BITE-1) ) + (THRESH»BITE) ; 

} 

else if (symbol > 1) 

{ 

stats_flag[index] = 1; 
if (list_index < 254) 

list[list_index++] = index; 
t = symbol-1; 

stats_apx[index] = -( (t*THRESH) » (BITE-1))-(THRESH»BITE) ; 

} 

else if (symbol==0) 

{ 

pruned_children[s] = 1; 

stats_flag[index+1] = stats_flag[index+1] | 4; 

stats_flag[index+2] = stats_flag[index+2] I 4; 
stats_flag[index+3] - stats_flag[index+3] I 4; 

} 

} 

else if ((stats_flag[index]&4)!=0) 

{ 

pruned_children[s] - 1; 

stats_flag[index] = stats_flag[index] & 251; 
stats_flag[index+1] = stats_flag[index+1] [ 4; 

stats_flag[index+2] = stats_flag[index+2] I 4; 
stats_flag[index+3] = stats_flag[index+3] I 4; 

} 

else 

stats_flag[index] = stats_flag[index] & 251; 
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j -k ***** + ★***•**★*•****•*★*★*★■*★★★★★*+•***************★★**■**★*★*★****■*■***★•*★* 

P_DEC 

** + *****★***★****★*★***********★**★*** + ★*******★********★*********•**•** •*i 

void p_dec() 

{ 

int k, kk, 1, i,j,X,Y,sum,bite; 

/* Main Loop: Continue until stop condition is reached */ 

FIRST_DEC = tbuf[0]; 

if (FIRST__DEC == 6) 

{ 

bite = 3; 

RB_DEC = 3; 

} 

else if (FIRST_DEC -« 5) 

{ 

bite - 3; 

RB_DEC = 2; 

} 

else if (FIRST_DEC == 4) 

{ 

bite = 2; 

RB_DEC - 2; 

} 

else 

{ 

bite = FIRST_DEC; 

RB__DEC - 1; 

} 

/* tell MP this PP is ready to decode the bit stream */ 

asm(" x2 - 0x00002100"); 

asm(" cmnd = x2”); 

/* wait for all PPs to catch up (signed by int from MP) */ 

while ( (INTFLG& (1«20) )==0) ; 

INTFLG = 1«20; 

/* read in global_maxval for this image (from the MP) */ 

maxval = l«global_maxval; 

while (*pp_stop_encode == 0) • 

{ 

BYTE_CNT = 0; 
buffer_index = 0; 

STOP = 0; 
list_index = 0; 

PASS_DEC = 0; 

THRESH = maxval; 

BITE = bite; 
bit index =0; 
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#ifdef NO_MULTI_BITPLANE 
BITE = 1; 

#endif 


start_model (1« (BITE+1) ) ; 

start_inputing_bits() ; 

/* wait for new bit stream */ 

while ( (INTFLG& (1«20) ) ==0) ; 

INTFLG = 1«20; 

if (*pp_stop_encode == 0) 

{ 

T_BYTES = ALLOC[0] ; 

quick_clear(); 

stats_flag[1536] = 0; 

start_decoding(); 

pass_nuni « 0; /* delete me */ 

while (STOP == 0) 

{ 

symbols_read = 0; 
fast_arith_decode2() ; 

} 

form_img2(0,0); 
form__img2 (1,1) ; 

asm(" x2 = 0x00002100"); /* tell MP done with this block of 

coefficients */ 

asm(" cmnd = x2"); 

while((INTFLG&{1<<20))~0); /* wait for MP to read coefficients */ 

INTFLG = 1«20; 

form_img2 (2,0); 
form_img2(3,1); 

asm(" x2 - 0x00002100"); /* tell MP done with this block of 

coefficients */ 

asm(" cmnd = x2"); 

while ((INTFLG&(1<<20)) “0); /* wait for MP to read coefficients */ 

INTFLG = 1«20; 

form_img2 (4,0); 
form_img2(5,1); 

asm(" x2 = 0x00002100"); /* tell MP done with this block of 

coefficients */ 

asm(" cmnd - x2"); 

} 

} 

} 
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/* BIT INPUT ROUTINES */ 

/* INITIALIZE BIT INPUT */ 

start_inputing_bits() 

{ 

bits_to__go = 0; 

} 


/* START DECODING A STREAM OF SYMBOLS */ 
void start_decoding() 

{ 

int i,t; 

value = 0; /* Fill code value */ 

for(i=l;i<-Code_value_bits;i++) 

{ 

t = new_input_bit(); 
value * 2*value+t; 

} 

low = 0; 

high = Top_value; 

} 


/* DECODE THE NEXT SYMBOL */ 
int decodersymbol() 

{ 

long range; 

int cum,t; /* Cumulative freq calculated */ 
int symbol; /* Decoded symbol */ 

range = (long) (high-low)+1; 

cum = (((long)(value-low)+l)*cum_freq[0]-l)/range; 

for (symbol =*l;cum__freq[symbol] >cum; symbol++) ; /* Find symbol */ 

/* Narrow ranges */ 

high = low + (range*cum_freq[symbol-1])/cum_freq[0]-1; 
low = low + (range*cum_freq[symbol])/cum_freq[0]; 

for (;;) 

{ 

if (high<Half) 

{ 

/* Do nothing */ 

} 

else if (low>=Half) 

{ 

value -= Half; 
low -= Half; 
high -= Half; 

} 

else if (low>=First_qtr && high<Third_qtr) 

{ 

value -= First_qtr; 
low -= First_qtr; 
high -= First_qtr; 

} 

else break; 
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low = 2*low; 
high - 2*high+l; 
t = new_input_bit(); 
value = 2*value+t; 

} 

return symbol; 

} 

cache_syms() 

{ 

unsigned char i; 
unsigned char sym; 

for (i=0;i<16;i++) 
save_stop[i] = 1; 

for (i-0;i<16;i++) 

{ 

symbol_bit_index[i] = bit_index; 
save_value[i] = value; 
save_high[i] = high; 
save_low[i] = low; 
sym = new_decode_symbol(); 
save^symbol[i] = index_to_char[sym] ; 
new^jupdatejnodel (sym) ; 
save^stop[i] - STOP; 
if (STOP—1) 
break; 

} 

STOP = save_stop[0] ; 
symbol_bit_index[16] = bit_index; 
save^value[16] = value; 
save_high[16] = high; 
save_low[16] = low; 
symbol_index = 0; 

stophere = 1; 

} 


pre_dbits2() 

{ 

int sym; 

int not_done; 

int i,j,1; 

int loop__counter; 

unsigned char num_in_cache; 

unsigned char tstop; 

unsigned char more_to_do; 

unsigned char temp_index; 

unsigned char cache_previously_exhausted; 

cache_previously_exhausted = read_more; 

quick__syms () ; 

if ((syms_to_do == 0)&&(cache_previously_exhausted == 0)) 

{ 

bitjlndex - symbol_bit_index[symbol^index]; 
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value = save_value[symbol_index] ; 
high = save__high [symbol_index] ; 
low = save__low [symbol_index] ; 

} 


if (syms_to_do>0) 

{ 

stophere = 2; 

if (cache_previously_exhausted — 0) 

{ 


; /* std always > 0, if we have gotten here*/ 

num_in_cache «* 16 - symbol_index; 

save_index = symbol_index; 

if ( syms__to_do > num_in_cache) 
tstop = save_stop[15]; 
else 

tstop = save_stop [symbol_index + syms_to_do - 1 ] / 

if (tstop == 1) 

{ 

for (i=0;i<num_in_cache;i++) 

if (save_stop[symbol_index + i] == 1) 
break; 

syms_to_do = i + 1; 
read_more - 0; 

STOP = 1; 
return(0); 

} 


/* do we already have all the symbols we need in the cache */ 
if (syms_to_do<=num_in_cache) 

{ 

bit_index = symbol_bit_index[symbol_index + syms_to_do]; 

value - save_value [ symbol_index + syms_to_do] ; 

high = save_high [symbol__index + syms_to_do] ; 

low = save_low[symbol_index + syms_to_do]; 

read_more = 0; 

return(0); 

} 


/* if we got here we need to read more symbols */ 
more_to_do = syms_to_do - num_in_cache; 
i = 0; 

while ((STOP—0) && (i<more_to_do)) 

{ 

sym = new_decode_symbol(); 
save_symbol[16 + i] = index_to_char [sym] ; 
new_update_model(sym); 
i = i + 1; 

} 
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if (ST0P==1) 


syms_to_do = i + num_in_cache; 
read more = 0; 


} 

else 

{ 

save_index = 0; 
i - 0; 

while ((STOP==0) && (i<syms_to_do)) 

{ 

sym = new_decode_symbol{); 
save_syinbol [i] = index_to_char [sym]; 
new_update_model (sym) ; 
i = i + 1; 

} 

if (STOP==l) 

{ 

syms_to_do - i; 
read_more = 0; 

} 

} 

} 

} 
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•k * 

** main.c (PP Program) 

★ ★ 

** Written by Jim Witham, Code 472300D, 939-3599 
* * 

** Main PP Program that calls all the other routines to perform the wavelet 

** bitstream decoding and wavelet composition. 

★ * 

*****************************************************★***********************/ 


j* 


#include "modlp.h" 


#define XCOLS 8 /* Max Columns that can be held in internal memory */ 

#define YROWS 4 /*-Max Columns that can be held in internal memory */ 


#define XSIZE 512 /* Max Image Size */- 

#define YSIZE 240 /* Max Image Height */ 

#define ML 50 /* Max order of filter allowed */ 

tdefine NSCALES 4 
#define AX 16 /* = XSIZE/BS */ 

#define AY 8 /* = YSIZE/BS */ 

#define BS 32 /* = 2 A NSCALES +/ 


#define DECODE_STREAM 
#define DECODE 


#define DISPLAY 


unsigned short T_BYTES; 
unsigned char *pic; 
short *img; 

short *stats_val; 
short *stats_apx; 
unsigned char *stats_flag; 
unsigned int *stomp__flag; 
unsigned int *stomp_ztl; 
unsigned char *byte_stream; 
short *coeff_block; 
unsigned short *ztl; 
unsigned char *tbuf; 
unsigned short *ALLOC; 

unsigned short *list; 

extern unsigned short *save__array; 
extern unsigned short *save_symbol; 
extern int *save__value; 
extern int *save_high; 
extern int *save_low; 

int No_of_chars; 

int EOF_symbol; /* Index of EOF symbol */ 
int No_of_symbols; /* total # of symbols */ 

/* TRANSLATION TABLES BETWEEN CHARS AND SYMBOL INDEXES */ 

unsigned char char_to_index[Max_No_of_chars] ; /* JCW old version was int */ 
unsigned char index_to_char [Max_No_of_symbols+l]; /* JCW old version was int */ 
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short int cum_freq[Max_No_of_symbols + l] ; /* JCW old version was int */ 
int buffer_index; /* JCW */ 

extern void subdec_vert(int outerloop,int innerloop); 
extern void subdec_horiz(int outerloop,int innerloop); 

extern int calc_n_sub_mean(int oldmean); 
extern void add_n_conv8(int oldmean); 

extern void subsyn_vert(int outerloop,int innerloop); 
extern void subsyn_horiz(int outerloop,int innerloop); 

extern int whoamiO; 


/* ************************* main program ************************** */ 

cregister extern volatile unsigned int INTFLG; 

short column_outerloop[5j = { 8, 16, 32, 16, 8 }; 

short row_outerloop[5] - { 4, 8, 16, 8, 4}; 

short column^innerloop[5] = { 120, 60, 30, 15, 8 }; 

short row^innerloop[5] = { 256, 128, 64, 32, 16 }; 

unsigned char vert_loop_index [ 6] = { 16, 4, 1, 1, 1 }; 
unsigned char horiz_loop_index(6] = { 15, 5, 1, 1, 1 }; 

short maxval; 

extern shared int mean_pp[4]; 
extern shared int global_mean; 

unsigned char *passes_d; 

extern unsigned char PASS_ENC,PASS_DEC; 

int STOP; 
int BYTE_CNT; 
unsigned short THRESH; 
int BITE; 
int SIG_COEF; 

int FIRST_ENC,FIRST_DEC; 
int FCNT_ENC,FCNT_DEC; 

/* delete me */ 

/^extern unsigned char symbols_injpass[]; 
extern unsigned char min_in_pass[]; 
extern unsigned char max_in_pass[]; */ 
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main () 
{ 


int k,l; 
int mean; 

/* initialize pic[], img[], stats_val[], stats__f lag [ ] , byte_stream[], 

coeff_block[] pointers */ 

asm (" dl = &* (dba) 11 ) ; 

asm(" *(xba+$pic) = dl M ); 

asm(" *(xba+$img) = dl") ; 

asm(" *(xba+$stats_val) * dl"); 

asm(" *(xba+$stats_apx) = dl"); 

asm(' f dl = &*(dba + 0x8000)"); 

asm(" *(xba+$stats_flag) = dl"); 

asm(" *(xba+$stomp_flag) = dl"); 

asm{" *(xba+$coeff_block) = dl"); 

asm(" xO - 0x8602"); 

asm(" nop"); 

asm(" dl = &*(dba + xO)"); 
asm(" *(xba+$list) -dl"); 

asm{" dl = &*(dba + OxcOO)"); 

asm(" *(xba + $ztl) - dl"); 

asm(" *(xba+$stomp_ztl) = dl"); 

asm(" *(xba+$save_array) = dl") ; 

asm(" dl = &*(dba + OxeOO)"); 

asm(" *(xba+$save_symbol) = dl"); 

asm(" dl - &*(dba + OxfOO)"); 
asm(" *(xba+$save_value) = dl"); 

asm(" dl = &*(dba + 0xf44)"); 

asm(" *(xba+$save_high) = dl"); 

asm(" dl = &*(dba + 0xf88)"); 

asm(" *(xba+$save_low) = dl"); 

asm(" dl = &*(pba + 0x630)"); 

asm(" *(xba+$byte_stream) = dl"); 

asm(" dl = &*(pba + 0x620)"); 

asm(" *(xba+$passes_d) = dl"); 

asm(" dl = &*(pba + 0x5fc)"); 

asm(" *(xba+$tbuf) = dl"); 

asm(" dl - &*(pba + 0x600)"); 

asm(" *(xba+$ALLOC) -dl"); 

FIRST_ENC - 4; 

FIRST_DEC = 4; 

FCNT^ENC - 0; 

FCNT_DEC = 0; 

/* Clear the message interrupt flag that comes from the MP, just in case */ 
INTFLG = 1«20; 

while (1) 

{ 

#ifdef DECODE_STREAM 
p_dec(); 

#endif 

#ifdef DECODE 


and 
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/* Reconstruct image */ 

for (k=NSCALES; k>=0; k—) 

{ 

for(1=0;l<horiz_loop_index[k];1++) 

{ 

/* wait for new pixels */ 
while { (INTFLG& (1«20) )==0) ; 

INTFLG = 1«20; 

subsyn_horiz(row_outerloop[k],row_innerloop[k]); 
if (k==0) 

add_n_conv8(global_mean) ; 

/* tell MP done with this block of pixels */ 

asm(" d7 = 0x00002100"); 

asm(" cmnd = d7"); 

} 


if (k!=0) 

for(1=0;l<vert_loop_index[k-1] ;1++) 

{ 

/* wait for new pixels */ 
while ( (INTFLG& (1«20) )==0) ; 

INTFLG = 1«2 0; 

subsyn_vert (column_outerloop [ k-1 ] , column_innerloop [ k-1] ) 

/* tell MP done with this block of pixels */ 

asm(" d7 = 0x00002100"); 

asm(" cmnd = d7"); 

} 


/* extra syncronizing wait for interrupt */ 

while ( (INTFLG& (1«20) )==0) ; 

INTFLG = 1«20; 

#endif 


} 
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/a****************************************************************** 
* * 

** modlp.c {PP Program) 

* * 

** Written by Jim Witham, Code 472300D, 939-3599 
* * 

** Contains start_model subroutine. This subroutine initializes 
** the translation tables and the frequency counters for the 
** arithmetic coder. 

★ * 

*************************************x***************************'**/ 

/* ADAPTIVE SOURCE MODEL */ 

#include "modlp.h" 

#include "arith.h" 

short int freq[Max_No_of_symbols+l]; /* Symbol frequencies */ 

extern int No_of_chars; 

extern int EOF_symbol; /* Index of EOF symbol */ 
extern int No_of_symbols; /* total # of symbols */ 

/* TRANSLATION TABLES BETWEEN CHARS AND SYMBOL INDEXES */ 

extern unsigned char char_to_index[Max_No_of_chars]; 

extern unsigned char index_to_char[Max_No_of_symbols+l]; 

extern short int cum_freq[Max_No_of_symbols+l] ; 

extern int low,high; 

extern short bits_to_follow; 

extern unsigned char bits_to_go; 

extern unsigned char buffer; 

extern int BYTE_CNT; 

extern unsigned short T_BYTES; 

extern unsigned char *byte_stream; 

extern int buffer_index; 

extern short BYTEJTOTAL; 

extern int STOP; 

/* INITIALIZE THE MODEL */ 

void start_model(nchars) 
int nchars; 

{ 

int i; 

/* Initialize number of chars */ 

No_of_chars = nchars; 

No_of_symbols - nchars; 

/* Setup translation tables */ 

for (i-0; i<No_of_chars; i++) 

{ 

char_to_index[i] = i+1; 
index_to_char[i+1] « i; 

} 
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/* Initialize frequency counts */ 

for(i=0;i<=No_of_symbols;i++) 

{ 

freq[i] = 1; 

cum_freq[i] = No_of_symbols-i; 

} 

freq[0] - 0; 

} 
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/* INTERFACE TO THE MODEL */ 

/* Set of symbols that may be encoded */ 

#define Max_No_of_chars 16 
tdefine Max_No_of_symbols 17 

/* Cumulative Frequency Table */ 

#define Max_frequency 75 /* 16383 Maximum allowed frequency cnt 2 A 14-1 */ 
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/****************************************************************************** 
* ★ 

★ * 

** mp.c (MP Program) 

★ * 

** Written by Jim Witham, Code 472330D, 927-1440 
* * 

** MP Program that orchestrates data movement and kicks off PP ? s to implement 
** the balanced wavelet algorithm written by Chuck Creusere. 

* * 

I****************************************************************************** 

*/ 

#include <stdlib.h> 

#include <stdio.h> 

#include <mvp.h> 

#include <mvp_hw.h> 
tinclude <mp_ptreq.h> 

tinclude "icl.h" 

#include ”vil24.h" 

#include "vol.h" 

#include "bgl.h" 
tinclude "pcic80.h" 

#include "cil.h" 

/* define compression ratio. Ratio is actually #:1 compression */ 
tdefine HOST 

/* tdefine variable_cornpression */ 
tdefine compression 40 
#define headersize 0 

#define DECODE_STREAM 
tdefine DECODE 

tdefine DISPLAY 

/* #define SHOWJffAITFORHOST */ 

/* tdefine SHOW_DECODETIME */ 

/* tdefine SHOW__ FRAME TIME */ 
tdefine SHOW_DSPUTIL 
tdefine SHOW_FRAMERATE 

/* tdefine INTER_DISPLAY */ 

/* tdefine OLD_DISPLAY */ 

/* tdefine DEBUG */ 

tdefine XSIZE 512 
tdefine YSIZE 240 

tdefine AX 16 
tdefine AY 15 
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/* 

** Define a bit mask for host request bit 
*/ 

#define SigBit(x) ( ( (UINT32)1)« (x)) 

#define HostRequestBitMask SigBit(8) 

#define C80ReadyBit 8 
#define C80DoneBit 9 

#define djauffers 6 

/***************************************★**********★***********************/ 

extern int *number_pixels; 

extern int *logo; 

extern int *frame_rate; 

extern int *dsp_util; 

int encode_time,decode_time; 
unsigned int time_in_wait; 

unsigned int start_time; 

/* int subblock_time[16]; */ 

/* int e[6 3; */ 

int time_colin,time_colout; 

int last_etime,last_dtime; 

#pragma DATA_SECTION(mean_pp,”sh_vars") 

#pragma DATA_SECTION(global_mean, M sh_vars") 

shared int mean_pp[4]; 
shared int global_mean; 

#pragma DATA_SECTION(local_maxval, "sh_vars”) 

#pragma DATA_SECTION(global_maxval,"sh__vars M ) 

shared int local_maxval[4]; 
shared int global_maxval; 

/* UINT32 buf_addr[d_buffers]; */ 

UINT32 stream_address; 

/* unsigned char ptr_head =0; */ 

UINT32 ptr_tail = 0; 

PCIC80STAT ReturnVal; 

UINT32 *ping_pong_addr - (UINT32 *) 0x010107D0; 

UINT32 *changemod = (UINT32 *) 0x010107D4; 

unsigned short local_alloc[AX*AY/6]; 

unsigned char computable[5] = {10, 20, 40, 80, 100}; 

unsigned char last__comp = 2; 

/**************************************************************************/ 

static void SignalHandler(UINT32 Signals); 
static void init_alloc_table(UINT32 index); 
static void init alloc table2(UINT32 index); 
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void task (void *arg) 

{ 

unsigned int times=0; 

unsigned int dx,dy; 
int i,j; 
int lp; 

unsigned char vert_loop_index[6],horiz_loop_index[6]; 
unsigned short jump_col[6]; 
unsigned int jump_row[6]; 

long *ptr; /* temp pointer */ 

PTREQ *p[14]; /* temp pointer to packet transfer structure */ 

int temp_maxval; 

/* JCWtest */ 

char stringl[32] / 
int temp_str; 
float temp_time; 
float wait_time; 
unsigned int junk_time; 
float frame_time; 
unsigned int frame_hit; 

int junki; 

unsigned short junkfill = 0; 

unsigned short * fillme = (unsigned short *) 0x90320000/ 

int k,l; 
long tempi; 
unsigned char tbuf; 

unsigned char * tbuf__pp0 = (unsigned char *) 0x010005fc; 

unsigned char * tbuf__ppl = (unsigned char *) 0x010015fc; 

unsigned char * tbuf_pp2 = (unsigned char *) 0x010025fc; 

unsigned char * tbuf_pp3 = (unsigned char *) 0x010035fc; 

unsigned char min_first; 

unsigned char ppnexttask[4j; 
unsigned short ppalloc[4]; 
unsigned char current_block; 
unsigned short table_size[4]; 
unsigned char pprequesting; 
unsigned char ppinfo[4]; 
unsigned char ppdone; 

unsigned char *pp_stop_encode = (unsigned char *) 0x010007D4; 
unsigned char DONE1; 
unsigned int temp_ulong; 
unsigned int first__dec; 

NOCACHE_INT(number_pixels[0]) = 0x80; 

NOCACHE_INT(global_mean) » 0x5e; 
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*pp_stop_encode = 0; 

/* initialize FIRST variable on all PPs - used for first pass # of bitplanes to 

process */ 

tbuf_pp0[0] = 4; 

tbuf_ppl[0] = 4; 

tbuf_pp2[0] = 4; 

tbuf_pp3[0] - 4; 

VolSetDisplay(VOL_VGA8_640); /* setup colour display for VIM-8 */ 

VolSetGreyLUT{); 

#ifdef HOST 

ReturnVal = CilSigHandler(SignalHandler) ; 
if (ReturnVal != CIL_OK) 

{ 

/*** Cannot register signal handler ***/ 
while (1) ; 
return; 

} 

#endif 


p[0] = (PTREQ *) (MP_PARM_RAM + 0x300); 

P [1] = P [ 0 ] + 1; 

P[2] = p[0] + 2; 

P [3] = p [0] + 3; 

P [4] = p [0] + 4; 

p[5] •= p[0] + 5; 

P[6] = p[0] + 6; 

p[7] = p[0J + 7; 

p[8] - p [0] + 8; 

P [9] = p [0] + 9; 

p[10] = p[0] + 10; 
p[11] = P[0] + 11; 

P [12] = p [0] + 12; 

P[13] = p[0] + 13; 

/* Set MP list pointer to point to first PT */ 
ptr - (long *) (MP_PTREQ_PTR); 

*ptr = (long) p[0]; 

/* DRAM (8 bit data) -> VRAM (raw image) */ 

*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
V 
*/ 
*/ 
*/ 
*/ 
*/ 


p [2]->link - p[2] ; 
p[2]->word[0] = 0x80000000; 
p[2]->word[1] - 0x80300000; 
p[2]->word[2] = 0xb400024 0; 
p[2]->word[3] = 0x00038000; 
p[2]->word[4] - OxOOff0200; 
p[2]“>word[5] = 0x00; 
p[2]->word[6] “ 0; 
p[2]->word[7] = 0x8000; 
p [2]->word(8] = 0x500; 
p[2]->word[9] * 0x00; 
p[2]->word[10] = 0x0000; 
p[2]->word[11] = 0; 
p[2]->word[12] = 0; 
p[2]->word[13] = 0; 
p[2]->word[14] = 0; 


/* point to next PT 

/* linear to VRAM 

/* Src address is DRAM 

/* Dst address is VRAM 

/* Src B count Src A count 

/* Dst B count Dst A count 

/* Src C count 

/* Dst C count 

/* Src B pitch 

/* Dst B pitch 

/* Src C pitch 

/* Dst C pitch 

/* Src transparency upper 

/* Src transparency lower 

/* Reserved 

/* Reserved 
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/* DRAM (columns and 16 bit data) -> internal RAM */ 

/* Can be used 4 times then change dst <- this can be done 8 times */ 


p[4]->link = p[4]; 

/* 

point to next PT 

*/ 

p[4]->word [0] 

= 0x80000002; 

/* 

Contig. Mem 

to int w/dst 

updt */ 

p[4]->word[l] 

= 0x80320000; 

/* 

Src 

address 

is DRAM 

*/ 

p [4 ] -*>word [2] 

- 0x00000000; 

/* 

Dst 

address 

is internal 

*/ 

p[4]->word[3] 

= OxOOffOOlO; 

/* 

Src 

B count 

Src A count 

V 

p[4]->word[4] 

« 0x00001000; 

/* 

Dst 

B count 

Dst A count 

*/ 

p [4 3“>word[5] 

= 0x00; 

/* 

Src 

C count 


*/ 

p[4]->word[6] 

= 0; 

/* 

Dst 

C count 


*/ 

p[4]->word[7) 

- 0x0400; 

/* 

Src 

B pitch 


*/ 

p[4]->word[83 

= 0x000; 

/* 

Dst 

B pitch 


*/ 

p[4 3->word[93 

= 0x10; 

/* 

Src 

C pitch 


*/ 

p[4]->word[10] 

= 0x1000; 

/* 

Dst 

C pitch 


V 

p[4]->word[113 

= 0; 

/* 

Src 

transparency upper 

*/ 

p[4 3->word[12] 

- 0; 

/* 

Src 

transparency lower 

*/ 

p[43->word[13) 

= 0; 

/* 

Reserved 


*/ 

p[4 3->word[14] 

- 0; 

/* 

Reserved 


*/ 


/* internal RAM -> DRAM (columns and 16 bit data) */ 

/* Can be used 4 times then change src then this can be repeated 8 times */ 


p[5]->link = p[5}; /* point to next PT */ 


p[53->word[0] 

= 

0x80000200; 

/* 

int 

to Contig. Mem w/src 

updt */ 

p[53->word[1] 

= 

0x00000000; 

/* 

Src 

address 

is internal 

*/ 

p[5]->word[2] 

= 

0x80320000; 

/* 

Dst 

address 

is DRAM 

*/ 

p[5]->word[3] 

= 

0x00001000; 

/* 

Src 

B count 

Src A count 

*/ 

p [5)->word[4] 

= 

OxOOffOOlO; 

/* 

Dst 

B count 

Dst A count 

*/ 

p[5]->word[53 

= 

0x00; 

/* 

Src 

C count 


*/ 

p[53->word[63 

= 

0; 

/* 

Dst 

C count 


*/ 

p[5)->word[73 

- 

0x0000; 

/* 

Src 

B pitch 


*/ 

p[53->word[83 

= 

0x400; 

/* 

Dst 

B pitch 


*/ 

p[5]->word[93 

= 

0x1000; 

/* 

Src 

C pitch 


V 

p[5]->word[103 

= 

' 0x0010; 

/* 

Dst 

C pitch 


*/ 

p[53->word[113 

= 

0; 

/* 

Src 

transparency upper 

*/ 

p[5]->word[12] 

= 

0; 

/* 

Src 

transparency lower 

*/ 

p[5]->word[133 

= 

0; 

/* 

Reserved 


*/ 

p[53->word[143 

= 

0; 

/* 

Reserved 


*/ 


/* DRAM (rows and 16 bit data) -> internal RAM */ 

/* Can be used 4 times then change dst then this can be repeated 8 times */ 


p[63“>link = p[6]; 

/* 

point to next PT 

*/ 

p[63->word[0) 

= 

0x80000002; 

/* 

Contig. Mem 

to int w/dst 

updt */ 

p[6)->word[1] 


0x80320000; 

/* 

Src 

address 

is DRAM 

*/ 

p [63->word[2] 

= 

0x00000000; 

/* 

Dst 

address 

is internal 

*/ 

p[6]->word[33 

= 

0x00001000; 

/* 

Src 

B count 

Src A count 

*/ 

p[63->word[43 

= 

0x00001000; 

/* 

Dst 

B count 

Dst A count 

*/ 

p[6]->word[53 

= 

0x00; 

/* 

Src 

C count 


*/ 

p[6]->word[6] 

= 

0; 

/* 

Dst 

C count 


*/ 

p [ 6 ] -*>word [ 7 3 

= 

0x0000; 

/* 

Src 

B pitch 


*/ 

p[63->word[83 

= 

0x000; 

/* 

Dst 

B pitch 


*/ 

p[63->word[93 


0x0000; 

/* 

Src 

C pitch 


*/ 

p[63->word[10] 

= 

0x1000; 

/* 

Dst 

C pitch 


*/ 

p[6]->word[113 

= 

0; 

/* 

Src 

transparency upper 

*/ 

p[63->word[12] 

= 

0; 

/* 

Src 

transparency lower 

*/ 

p[63->word[133 

- 

0; 

/* 

Reserved 


*/ 

p[63->word[14] 

= 

0; 

/* 

Reserved 


*/ 
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/* internal RAM -> DRAM (rows and 16 bit data) */ 

/* Can be used 4 times then change src then this can be repeated 8 times */ 


p[7]“>link - p 

>[73; 

/* 

point to next PT 

V 

p[7]->word [0] 

- 0x80000200; 

/* 

int 

to DRAM w/src updt 

*/ 

p[7]->word[1] 

= 0x00000000; 

/* 

Src 

address is internal 

*/ 

p[7]->word[2] 

= 0x80320000; 

/* 

Dst 

address is DRAM 

*/ 

p[7]->word[3] 

= 0x00001000; 

/* 

Src 

B count Src A count 

*/ 

p[7]->word[4] 

= 0x00001000; 

/* 

Dst 

B count Dst A count 

*/ 

p[7]->word[5] 

= 0x00; 

/* 

Src 

C count 

*/ 

p[7]->word[6] 

= 0; 

/* 

Dst 

C count 

*/ 

p[7]->word[7] 

« 0x0000; 

/* 

Src 

B pitch 

*/ 

p[7]~>word[8] 

= 0x000; 

/* 

Dst 

B pitch 

*/ 

p[7]->word[9) 

= 0x1000; 

/* 

Src 

C pitch 

*/ 

p[7]->word[10] 

= 0x0000; 

/* 

Dst 

C pitch 

* / 

p[7 3->word[11] 

= 0; 

/* 

Src 

transparency upper 

*/ 

p[7)->word[12] 

- 0; 

/* 

Src 

transparency lower 

*/ 

p[7]->word[133 

= 0; 

/* 

Reserved 

*/ 

p[7]->word[143 

= 0; 

/* 

Reserved 

*/ 


/* internal RAM -> VRAM (rows and 8 bit data) */ 

/* Can be used 4 times then change src then this can be repeated 8 times */ 


p[8]“>link = p[8 3; 

/* 

point to next PT 

*/ 

p[8 3->word[0] 

= 0x80000202; 

/* 

int 

to VRAM 

w/dst updt 

V 

p[8)->word[1] 

= 0x00000000; 

/* 

Src 

address 

is internal 

*/ 

p[83->word[2] 

= 0xb4000240; 

/* 

Dst 

address 

is VRAM 

*/ 

p[8]“>word[3) 

- 0x00000800; 

/* 

Src 

B count 

Src A count 

*/ 

p[8 3“>word[4] 

- 0x00030200; 

/* 

Dst 

B count 

Dst A count 

*/ 

p[83->word[53 

= 0x00; 

/* 

Src 

C count 


*/ 

p[83 ~>word[63 

= 0; 

/* 

Dst 

C count 


*/ 

p[8 3->word[7) 

= 0x0000; 

/* 

Src 

B pitch 


*/ 

p[8]->word[83 

= 0x280; 

/* 

Dst 

B pitch 


*/ 

p[8 3“>word[93 

= 0x1000; 

/* 

Src 

C pitch 


*/ 

p[83“>word[103 

= 0x1000; 

/* 

Dst 

C pitch 


*/ 

p[8 3->word[113 

= 0; 

/* 

Src 

transparency upper 

*/ 

p[8 3->word[12 3 

= 0; 

/* 

Src 

transparency lower 

*/ 

p[83->word[133 

= 0; 

/* 

Reserved 


*/ 

p[83->word[143 

= 0; 

/* 

Reserved 


*/ 


/* internal RAM -> VRAM (rows and 8 bit data) */ 
/* Can be used 1 times then change src and dst */ 


p[93->link = p[93; 

/* 

point 

to next PT 


*/ 

p[93 ~>word[03 

= 0x80000000 

/* 

int 

to VRAM 

w/dst 

updt 

*/ 

p[9]->word[1] 

= 0x00000000 

/* 

Src 

address 

is internal 

*/ 

p[9]->word[23 

= 0xb400024 0 

/* 

Dst 

address 

is VRAM 

*/ 

p[9]->word[33 

- 0x00000800 

/* 

Src 

B 

count 

Src A 

count 

*/ 

p[93->word[4] 

= 0x00030200 

/* 

Dst 

B 

count 

Dst A 

count 

*/ 

p[93->word[5] 

= 0x00; 

/* 

Src 

C 

count 



*/ 

p[93->word[63 

= 0; 

/* 

Dst 

c 

count 



*/ 

p[9]->word[7] 

= 0x0000; 

/* 

Src 

B 

pitch 



*/ 

p[93->word[8 3 

= 0x500; 

/*. 

Dst 

B 

pitch 



*/ 

p[9]->word[93 

= 0x1000; 

/* 

Src 

C 

pitch 



*/ 

p[93->word[103 

= 0x0008; 

/* 

Dst 

C 

pitch 



*/ 

p[93->word[113 

= 0; 

/* 

Src 

transparency 

upper 

*/ 

p[93->word[123 

= 0; 

/* 

Src 

transparency 

lower 

*/ 

p[9]->word[133 

= 0; 

/* 

Reserved 



*/ 

p [ 93->word[143 

- 0; 

/* 

Reserved 



*/ 
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internal -> DRAM coefficient blocks 

(16 bit data)*/ 



p[10]->link - p[10]; 

/* point to next PT 


*/ 

p[10]->word[0] - 0x80000000; 

/* Contig. Mem to int w/dst updt 

*/ 

p[10]“>word[1] = 0x00008000; 

/* Src address is DRAM 


*/ 

p[10]->word[2] - 0x80320000; 

/* Dst address is internal 


*/ 

p[10]->word[3] = 0x00000400; 

/* Src B count Src A count 


*/ 

p[10]->word[4] = OxOOOf0040; 

/* Dst B count Dst A count 


*/ 

p[10]->word[5] = 0x00; 

/* Src C count 


*/ 

p[10]->word[6] = 0; 

/* Dst C count 


*/ 

p[10]->word[7] = 0x000; 

/* Src B pitch 


*/ 

p[10]->word[8] = 0x0400; 

/* Dst B pitch 


*/ 

p[10]->word[9] = 0x1000; 

/* Src C pitch 


*/ 

p[10]->word[10] = 0x40; 

/* Dst C pitch 


*/ 

p[10]->word[11] - 0; 

/* Src transparency upper 


*/ 

p[10]->word[12] = 0; 

/* Src transparency lower 


*/ 

p[10]->word[13] = 0; 

/* Reserved 


*/ 

p[10]->word[14] = 0; 

/* Reserved 


V 

DRAM -> internal RAM (byte stream 8 

bit data) */ 



p[12]->link = p [12]; 

/* point to next PT 


*/ 

p[12]->word[0] = 0x80000200; 

/* SDRAM to internal w/src 

updt 

*/ 

p[12]->word[1] - 0x80380000; 

/* Src address is external 


*/ 

p[12]->word[2] - 0x01000630; 

/* Dst address is internal 

PRAM 

*/ 

p[12]->word[3] = 0; 

/* Src B count Src A count 


*/ 

p[12]->word[4] = 0; 

/* Dst B count Dst A count 


*/ 

p[12]->word[5] = 0x00; 

/* Src C count 


*/ 

p[12]->word[6] = 0; 

/* Dst C count 


*/ 

p[12]->word[7] = 0x0000; 

/* Src B pitch 


*/ 

p[12]->word[8] = 0x000; 

/* Dst B pitch 


*/ 

p[12]->word[9] = 0; 

/* Src C pitch 


*/ 

p[12]->word[10] = 0x1000; 

/* Dst C pitch 


*/ 

p[12]->word[113 - 0; 

/* Src transparency upper 


*/ 

p[12]->word[12] = 0; 

/* Src transparency lower 


*/ 

p[12]->word[13] = 0; 

/* Reserved 


*/ 

p[12]->word[14] = 0; 

/* Reserved 


*/ 


/* IE = disable{) & OxfbfOfffe; */ 
/* IE = 0x01; */ 

/* INTPEN = OxFOOOO; */ 


vert_loop_index[0] 
vert_loop__index [ 1 ] 
ve rt_loop_index[2] 
vert_loop_index[3] 
vert_loop_index[4] 


16; 

4; 

1 ; 

1 ; 

1 ; 


horiz_loop_index[0] = 15; 
horiz_loop_index[1] = 5; 
horiz_loop_index[2] = 1; 
horiz_loop_index[3] = 1; 
horiz_loop_index[4] =1; 


jump_col[0] 
jump_col[1] 
jump_col[2] 
jump_col[3] 
jump_col[4] 


16; 

64; 

256; 

256; 

256; 
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j ump_row[0] 
j ump_row[1] 
j ump_row[2] 
jump_row[3] 
jump_row[4] 


0x1000/ /* not used */ 

0x3000; 

OxFOOO; 

0x10000; 

0x10000; 


init_alloc_table(2); 


/I****************************************************************************** 

*/ 


/* Setup Initial graphics on output screen */ 


/* DRAM (8 bit data) -> VRAM (raw image) */ 


p[2]->link = p [2]; 

/* 

point to next PT 

*/ 

p[2]->word[0] = 0x80000000; 

/* 

linear to VRAM 

*/ 

p[2]->word[l] = 0x80300000; 

/* 

Src address is DRAM 

*/ 

p[2]->word[2] = 0xb4000440; 

/* 

Dst address is VRAM 

*/ 

p[2]->word[3] = 0x00007800; 
image */ 

/* 

Src B count Src A count 

64x480 

p[2]->word[4] = 0x01df0040; 

/* 

Dst B count Dst A count 

*/ 

p[2]->word[5] = 0x00; 

/* 

Src C count 

*/ 

p[23->word[63 = 0; 

/* 

Dst C count 

*/ 

p[2]->word[73 = 0x00; 

/* 

Src B pitch 

*/ 

p[2 3->word[8 3 = 0x280; 

/* 

Dst B pitch 

*/ 

p[2]->word[9] = 0x00; 

/* 

Src C pitch 

*/ 

p[ 2 ]->word [103 = 0x0000; 

/* 

Dst C pitch 

V 

p[2]->word[11] = 0; 

/* 

Src transparency upper 

*/ 

p[2 3->word[12 3 = 0; 

/* 

Src transparency lower 

*/ 

p[23->word[133 - 0; 

/* 

Reserved 

*/ 

p[23->word[14] = 0; 

/* 

Reserved 

*/ 


p [2]->word[1] = (long)&logo; 

*ptr = (long) p[2]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from external to VRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 


B-135 




NAWCWD TP 8442 


p[2]->word[2] = 0xb4000200; /* Dst address is VRAM */ 

p[2]->word[3] = 0x00000940; /* Src B count Src A count 64x37 size 

image */ 

p[2]->word[4] = 0x00240040; /* Dst B count Dst A count */ 

p[2]->word[1] = (long)&dsp_util; 

*ptr = (long) p[2]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from external to VRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02);^ 


p[2]->word[2] = 0xb400ae80; 
p[2]->word[3j « 0x00000940; 
image */ 

p[2]->word[4] = 0x00240040; 
p[2]->word[1] = (long)&frame_rate; 

*ptr - (long) p[2] ; 

PKTREQ |* MP_PKTREQ_P_BIT; /* start xfer from external to VRAM */ 

i = i + 1; 
i = i - 1 ; 

while(PKTREQ & 0x02); 


/* Dst address is VRAM */ 

/* Src B count Src A count 64x37 size 

/* Dst B count Dst A count */ 


/* DRAM (8 bit data) -> VRAM (raw image) 

p[2]->link - p[2]; 
p[2]->word[0] = 0x80000000; 
p[2]->word[1] = 0x80300000; 

• p[2]->word[2] - 0xb4000240; 
p[2]->word[3] = 0x00038000; 
p[2]->word[4] - OxOOff0200; 
p[2]~>word[5] = 0x00; 
p[2]->word[6] =0; 
p[2]->word[7] = 0x8000; 
p[2]->word[8] - 0x800; 
p[2]->word[9J = 0x00; 
p[2]->word[10] = 0x0000; 
p[2]->word[11] - 0; 
p [2)->word[12] - 0; 
p [2]->word[13] = 0; 
p[2]->word[14] = 0; 


*/ 


/* point to next PT */ 
/* linear to VRAM */ 
/* Src address is DRAM */ 
/* Dst address is VRAM */ 
/* Src B count Src A count */ 
/* Dst B count Dst A count */ 
/* Src C count */ 
/* Dst C count */ 
/* Src B pitch */ 
/* Dst B pitch */ 
/* Src C pitch */ 
/* Dst C pitch */ 
/* Src transparency upper */ 
/* Src transparency lower */ 
/* Reserved */ 
/* Reserved */ 
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#ifdef HOST 
/* 

** Tell host we are ready to start decoding 
*/ 

ReturnVal - CilRaiseSignalNumber(C80ReadyBit); 
if (ReturnVal != CIL_OK) 

{ 

/*** Cannot raise signal ***/ 
while (1); 
return; 

} 

#endif 

while (1) 

{ 


/* TCOUNT - Oxffffffff; */ 

if (TCOUNT ==0) 

TCOUNT = Oxffffffff; 


p[9]->word[2] 

p[7]->word[3] 
p [7]~>word[4] 
p[7]->word[6J 
p[7]->word[8] 
p[7]->word[10] 


= 0xb4000240; 

= 0x00001000 
« 0x00001000 
= 0x00000000 
= 0x00000000 
= 0x00001000 


/* Dst address is VRAM 


/* 

** Wait for request from host 
*/ 

time in wait = TCOUNT; 


CilReadMailbox(0,(PUINT32)ping_pong_addr); 

while( *ping_pong_addr -= ptr_tail) 

{ 

for (i=0;i<1000;i++) 

{ 

i - i + 1; 
i = i - 1; 

} 

CilReadMailbox (0, (PUINT32) ping_pong__addr) ; 

} 

stream_address = 0x90013000 + (ptr_tail * 0x4000); 

ptr_tail = ptr_tail + 1; 

if (ptr_tail == d_buffers) ptr_tail = 0; 
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/* Do time calculations */ 

time_in_wait = time_in_wait - TCOUNT; 

frame_time = (frame_hit - TCOUNT)*0.000025; 
frame_hit « TCOUNT; 

decode_time = TCOUNT; 

/* Read dynamic values from the bitstream */ 

temp_ulong - NOCACHE_INT(* (int *) stream_address) ; 

first_dec = {(temp_ulong & OxffOOOOOO) » 24); 

tbuf_pp0[0] = first_dec; 
tbuf_ppl[0] = first_dec; 
tbuf_pp2[0] = first_dec; 
tbuf_pp3[0] = first_dec; 

globaljnean = ((temp_ulong & OxOOffOOOO) » 16); 
global_maxval = ((temp_ulong & OxOOOOffOO) >> 8); 
*changemod = (temp_ulong & OxOOOOOOff); 


#ifndef variable_compression 

/* Check for change in compression */ 

temp_ulong = (NOCACHE_INT(* (int *) (stream_address +4))) & OxOOOOOOff 

if (temp_ulong != last_comp) 

{ 

init_alloc_table (temp_ulong); 
last__comp = temp_ulong; 

} 

#else 

/* Check for change in compression */ 

temp_ulong « (NOCACHE__INT (* (int *) (stream_address + 4))); 

if (temp_ulong < 480) temp_ulong - 480; 
if (temp_ulong > 16640) temp_ulong = 16640; 

if (temp_ulong != last_comp) 

{ 

init_alloc_jtable2 (temp_ulong) ; 
last_comp = temp_ulong; 

} 

#endif 
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/*********************************************************************** j 

/************************** Decode section *****************************/ 

/******************************★**************************************** j 

/* Do Bit stream conversion to coefficients */ 

/* 128 byte blocks make up 32x32 coefficient blocks that are organized */ 

/* as 8x16 */ 

#ifdef DECODE_STREAM 

ppnexttask[0] = 0/ 
ppnexttask[1] = 0; 
ppnexttask[2] = 0; 
ppnexttask[3] = 0; 

current_block = 0; 
ppdone = 0; 

p[12]->word[1] = stream_address +8; /* Read Src address from queued 

requests */ 

while((INTPEN & OxFOOOO)!=0xF0000); /* Wait for all PPs to catch up */ 

*pp_stop_encode - 0; 

command{0x0000200F); /* send msg interrupt to all PPs */ 

while (ppdone!=4) 

{ 

if ((INTPEN & OxFOOOO)!=0x00) 

{ 

DONE1 = 0; 

/* find out which PP requested service */ 

for (pprequesting=0;pprequesting<4;pprequesting++) 

{ 

if ((INTPEN & (1« (16+pprequesting) )) != 0) break; 

} 

/* clear the interrupt flag */ 

INTPEN = 0xl0000<<pprequesting; 

/* Get third pair of coefficients from PP and send in next block of bits to 
decode if not complete */ 

if ((ppnexttask[pprequesting] == 3) && (1DONE1)) 

{ 

p[10]->word[1] ~ 0x00008000 + (pprequesting<<12); /* Src 

address is internal RAM2 */ 

p[10]->word[2] = 0x80348000 + (ppinfo[pprequesting]»3)*0x04000 + 

(ppinfo[pprequesting]&7)*64; /* Dst for block changes */ 

*ptr = (long) p[10]; 

/* kick off 1024 byte (32x16 words) coefficient block transfer from 
internal to SDRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 
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p[10]->wordt1] = p[10]->word[l] + 0x400; /* Src for block 

changes is internal RAM2 */ 

p[10]->word[2] = p [10]->word[2] + 0x200; /* Dst address is 

SDRAM */ 

/* kick off 1024 byte {32x16 words) coefficient block transfer from 
internal to SDRAM */ 

PKTREQ |= MP__PKTREQ__P_BIT; i *= i + 1; i = i - 1; while (PKTREQ & 0x02); 

ppnexttask[pprequesting] = 1; 

if (current_block != 40) 

{ 

/* read in TBYTES number of bytes from the bit stream for this block to 
the appropriate PP */ 

table_size[pprequesting] = local_alloc[current_block]; 

ppinfo[pprequesting] - current_block; 
ppnexttask[pprequesting] = 1; 
current__block - current_block + 1; 

* (unsigned short *) ( 0x01000 600 + (pprequesting<<12) ) = 

table_size[pprequesting] - *changemod; 

p [ 12] ->word [2] = 0x01000630 + (pprequesting«12) ; /* Dst 

address is internal Parameter RAM */ 

*ptr = (long) p[12]; 

/* Read the number of bytes that need to be transferred */ 

p[12]->word[3 ] = p[12]->word[4] = p[12]->word[9] = 

table_size[pprequesting]; 

/* write out TBYTES value for this block to the appropriate PP */ 

*(unsigned short *)(0x01000600 + (pprequesting<<12)) = 

table_size[pprequesting] - *changemod; 

/* kick off transfer from SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P__BIT; i = i + l;i = i- l; while (PKTREQ & 

0x02) ; 

command (0x00002000 + {l«pprequesting) ) ; /* send msg 

interrupt to PP that requested service */ 

} 

else ppdone = ppdone + 1; 

DONE1 = 1; 

} 

/* Get first or second pair of coefficients from PP */ 

if (((ppnexttask[pprequesting] > 0) && (ppnexttask[pprequesting] < 3)) && 

(IDONE1)) 

{ 

p [10] ->word[l] « 0x00008000 + (pprequesting«12) ; /* Src 

address is internal RAM2 */ 

p[10]->word[2] = 0x80320000 + (ppinfo[pprequesting]»3)*0x04000 + 

(ppinfo[pprequesting]&7)*64 + 0x14000*(ppnexttask[pprequesting]-1); /* 

Dst for block changes */ 
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*ptr = (long) p [10]; 

/* kick off 2048 byte coefficient block transfer from internal to SDRAM 

*/ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1/ while(PKTREQ & 0x02); 

p[10]->word[1] - p[10]->word[l] + 0x400; /* Src for block 

changes is internal RAM2 */ 

p[10]->word[2] = p[10]->word[2] + 0x200; /* Dst address is 

SDRAM */ 

/* kick off 1024 byte (32x16 words) coefficient block transfer from 
internal to SDRAM */ 

PKTREQ |- MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

command(0x00002000 + (l<<pprequesting) ) ; /* send msg 

interrupt to PP that requested service */ 

ppnexttask[pprequesting] += 1; 

DONE1 - 1; 

} 

/* Give PP a block of bits to decode */ 

/* Move bitstream from SDRAM to internal */ 

if ((ppnexttask[pprequesting] ~ 0) && (JDONE1)) 

{ 


/* read in TBYTES number of bytes from the bit stream for this block to the 
appropriate PP */ 

table_size[pprequesting] - local_alloc[current_block]; 

ppinfo[pprequesting] = current_block; 

ppnexttask[pprequesting] - 1; 

current_block = current_block + 1; 

p [ 12]->word [2] = 0x01000630 + (pprequesting«12) ; /* Dst address 

is internal Parameter RAM */ 

*ptr = (long) p[12]; 

/* Read the number of bytes that need to be transferred */ 
p[12]->word[3] - p[12]->word[4] = p[12]->word[9] = 

table_size[pprequesting]; 

/* write out TBYTES value for this block to the appropriate PP */ 

* (unsigned short *) (0x01000600 + (pprequesting«12) ) - 
table_size[pprequesting] - *changemod; 

/* kick off transfer from SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i « i + 1; i = i - 1; while (PKTREQ & 0x02); 

command(0x00002000 + (l<<pprequesting)); /* send msg interrupt 

to PP that requested service */ 

DONE1 = 1; 

} 
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} 

} 


*pp_stop_encode = 1; 

command(0xO000200F); /* send msg interrupt to all PPs */ 

#endif 

/* FIRST variable is sent in with the bitstream */ 

/* Determine new FIRST variable common to all PPs */ 

/* select minimum FIRST value */ 

/* min_first = tbuf__ppO [ 1] ; 

if (tbufjppl[1] < min_first) min_first = tbuf_ppl[l]; 

if (tbufj?p2[l] < min_first) min_first = tbuf_pp2[l]; 

if (tbufjpp3[l] < min_first) min_first = tbufj?p3[l]; 

*/ 

/* copy it to all PPs */ 

/* tbuf_pp0[0] = min_first; 
tbuf_ppl[0] - min_first; 
tbuf_pp2[0] - min_first; 
tbuf__pp3[0] = min_first; 

* / 

CilWriteMailbox{1, ptr_tail); 

#ifdef DECODE 

for (lp=4; lp>=0; lp—) 

{ 

/* Do 32 rows */ 

for (i=0; i<horiz_loop_index[lp]; i++) 

{ 

if (i—0) 

{ 

•p[6]->word[l] - 0x80320000; 
p [7]->word[2] = 0x80320000; 

} 

p[6]->word[2] = 0x00000000; /* Dst address is internal 

p [7]->word[1] = 0x00000000; /* Src address is internal 

*ptr = (long) p[6] ; 

if (i—0) 

{ 

switch (lp) { 
case 0: 

{ 

p [6]->word[3] - 0x00001000; 

p[6]->word[4] - 0x00001000; 

p [6]->word[5] - 0x00000000; 

p[6]->word[7] = 0x00000000; 

p [6]->word[9] - 0x00000000; 


/* Src address is SDRAM 
/* Dst address is SDRAM 
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p[7]->word[3] 
p[7]->word[4] 
p[7]->word[6] 
p [7]->word[8] 
p[7]“>word[10] 
break; 

} 

case 1: 

{ 

p[6]->word[3] 
p[6]->word[4] 
p[6]->word[5] 
p[6]->word[7] 
p[6]->word[9] 

p [7]->word[3] 
p[7]->word[4] 
p[7]->word[6] 
p[7]->word[8] 
p[7]->word[10] 
break; 

} 

case 2: 

{ 

p[6]->word[3] 
p[6]->word[4] 
p[6]->word[5] 
p[6]->word[7] 
p[6]“>word[9] 

p[7]->word[3] 
p[7]->word[4] 
p[7]->word[6] 
p[7]->word[8] 
p[7]->word[10] 
break; 

} 

case 3: 

{ 

p[6]->word[3] 
p[6]->word[4] 
p[6]->word[5] 
p[6]->word[7] 
p[6]->word[9] 

p[7]->word[3] 
p[7]->word[4] 
p[7]->word[6] 
p[7]->word[8] 
p[7]->word[10] 
break; 

} 

case 4: 

{ 

p[6]->word[3] 
p[6]->word[4] 
p[6]->word[5] 
p[6]~>word[7] 
p[6]->word[9] 

p[7]->word[3] 
p[7]->word[4] 


= 0x00001000 
= 0x00001000 
- 0x00000000 
= 0x00000000 
= 0x00000000 


= OxOOff0002 
= OxOOOOOcOO 
= 0x00000005 
= 0x00000004 
= 0x00000800 

= OxOOOOOcOO 
= OxOOff0002 
= 0x00000005 
= 0x00000004 
«■ 0x00000800 


- 0x007f0002 

- OxOOOOOfOO 
= OxOOOOOOOe 
= 0x00000008 
= 0x00001000 

= OxOOOOOfOO 
= 0x007f0002 
= OxOOOOOOOe 
= 0x00000008 
= 0x00001000 


* 0x003f0002 
= 0x00000400 
= 0x00000007 
= 0x00000010 
= 0x00002000 

= 0x00000400 

- 0x003f0002 
= 0x00000007 

- 0x00000010 
= 0x00002000 


= OxOOlf0002; 
= 0x00000100; 
= 0x00000003; 
= 0x00000020; 
= 0x00004000; 

- 0x00000100; 

- OxOOlf0002; 
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p[7]->word[6] = 0x00000003; 

p[7]->word[8] = 0x00000020; 

p[7]->word[10] - 0x00004000; 
break; 

} 

default: 
break; 

} 

} 


/* kick off transfer from SDRAM to internal */ 

PKTREQ MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02) 

#ifndef DEBUG 

command(0x00002001); /* send msg interrupt to PPO */ 

#endif 

p[6]->word[1] = p[6]->word[l] + jump_row[lp]; 

/* kick off transfer from SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i=i+l;i=i-l; while(PKTREQ & 0x02) 

#ifndef DEBUG 

command(0x00002002); /* send msg interrupt to PP1 */ 

#endif 

p[6]->word[l] = p[6]->word[1] + jump_row[lp]; 

/* kick off transfer from SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i - i + 1; i = i - 1; while(PKTREQ & 0x02) 

tifndef DEBUG 

command(0x00002004); /* send msg interrupt to PP2 */ 

#endif 

p[6]->word[l] * p[6]->word[l] + jump_row[lp]; 

/* kick off transfer from SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02) 

#ifndef DEBUG 

command(0x00002008); /* send msg interrupt to PP3 */ 

#endif 

p[6]“>word[1] = p [6]->word[l] + jump_row[lp]; 

#ifndef DEBUG 

while((INTPEN & 0x10000)==0x00); /* poll PPO */ 

INTPEN = 0x10000; /* clear the interrupt flag */ 

#endif 

if (lp==0) 

*ptr - (long) p[9]; 
else 

*ptr = (long) p[7]; 

p[9]->word[lJ = 0x00000000; /* Dst address is VRAM */ 

/* start xfer from internal to DRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02) 


B-144 



NAWCWD TP 8442 


if (lp—0) 

{ 

p[9]->word[2] = p[9]->word[2] + 0x280; 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 
p[9]->word[2] = p[9]->word[2] - 0x280 + 5120; 
p[9]->word[1] = p[9]->word[1] + 0x1000; 

} 


p[7]->word[2] = p[7]->word[2] + jump_row[lp]; 
iifndef DEBUG 

while{(INTPEN & 0x20000)==0x00); /* poll PP1 */ 

INTPEN = 0x20000; /* clear the interrupt flag */ 

#endif 


/* start xfer from internal to DRAII */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

if (lp—0) 

{ 

p[9]->word [2] = p[9]->word[2] + 0x280; 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 
p[9]->word[2] = p[9]->word[2] - 0x280 + 5120; 
p[9]->word[l] = p[9]“>word[1] + 0x1000; 

} 


% 


p[7]“>word[2] = p[7]->woid[2] + jump_row[Ip]; 


#ifndef DEBUG 

while{(INTPEN & 0x40000)==0x00); /* poll PP2 */ 

INTPEN = 0x40000; /* clear the interrupt flag */ 

#endif 


/* start xfer from internal to DRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i=i+l;i«i-l; while(PKTREQ & 0x02); 

if (lp—0) 

{ 

p[9]->word[2] = p[9]->word[2] + 0x280; 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 
p[9]->word[2] = p[9]->word[2] - 0x280 + 5120; 
p[9]“>word[1] = p[9]->word[1] + 0x1000; 

} 


p[7]->word[2] = p[7]->word[2] + jump_row[lp]; 


#ifndef DEBUG 

while((INTPEN & 0x80000)==0x00); /* poll PP3 */ 

INTPEN = 0x80000; /* clear the interrupt flag */ 

#endif 


/* start xfer from internal to DRAM */ 

PKTREQ 1= MP_PKTREQ_PJBIT; i - i + 1; i =.i - 1; while(PKTREQ & 0x02); 

if (lp==0) 

{ 

p[9]->word[2] = p[9]->word[2] + 0x280; 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 
p[9]->word[2] = p[9]->word[2] - 0x280 + 5120; 
p[9]->word[1] = p[9]->word[l] + 0x1000; 

} 
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p[7]->word[2] = p[7]->word[2] + jump_row[lp]; 
} /* end for i */ 


/* Do the appropriate number of columns depending on the scale */ 
if (Ip!=0) 

for (i=0; i<vert_loop_index[lp-1 ] ; i++) 

{ 

if (i=“0) 

{ 

p[4]->word[1] - 0x80320000; 
p[5]->word[2] = 0x80320000; 

} 

p[4]->word[2] = 0x00000000; 

*ptr = (long) p[4]; 

#ifdef DEBUG 
/* JCWtest */ 
if (i==l) 

{ 

for (junki~0;junki<256*256;junki=junki+l) 

{ 

fillme[junki] = junkfill; 
junkfill = junkfill + 1; 

> 

} 

#endif 


/* Src address is DRAM */ 

/* Dst address is DRAM */ 


/* Dst address is internal */ 


if (i—0) 

{ 

switch (lp) { 
case 1: 

{ 

p[4]->word[3] 
p[4]->word[4] 
p[4]->word[5] 
p[4]->word[7] 
p[4]->word[9] 


OxOOef0010; 
OxOOOOOfOO; 
0x00000000; 
0x00000400; 
0x00000000; 


p[5]->word[3] 
p[5]->word[4] 
p[5]->word[6] 
p[5]->word[8] 
p[5]->word[10] 
break; 

} 

case 2 : 

{ 

p[4]->word[3] 
p[4]->word[4] 
p[4]->word[5] 
p[4]->word[7] 
p[4]->word[9] 


= OxOOOOOfOO; 
= OxOOef0010; 
= 0x00000000; 
= 0x00000400; 
- 0x00000000; 


- OxOOOf0002; 
= OxOOOOOfOO; 
= 0x00000077; 
= 0x00000004; 
= 0x00000800; 


p[5]->word[3] = OxOOOOOfOO; 

p[5]->word[4] = 0x000f0002; 
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p[5]->word[6] - 0x00000077; 

p[5]->word[8] = 0x00000004; 

p[5]->word[10] - 0x00000800; 
break; 

} 

case 3: 

{ 

p[4]->word[3] = 0x001f0002; 

p[4]->word[4] = OxOOOOOfOO; 

p[4]->word[5] = 0x0000003b; 

p[4]->word[7] = 0x00000008; 

p[4]->word[9] = 0x00001000; 

p[5]->word[3] = OxOOOOOfOO; 

p[5]->word[4] .= 0x001f0002; 
p[5]“>word[6] = 0x0000003b; 

p[5]->word[8] = 0x00000008; 

p[5]->word[10] = 0x00001000; 
break; 

} 

case 4: 

{ 

p[4]->word[3] = OxOOOf0002; 

p[4]->word[4] = 0x000003c0; 

p[4]->word[5] = OxOOOOOOld; 

p[4]->word[7] = 0x00000010; 

p [ 4]->word[9] = 0x00002000; 

p[5]->word[3] = 0x000003c0; 

p[5]->word[4] = OxOOOf0002; 

p[5]->word[6] = OxOOOOOOld; 

p[5]->word[8] = 0x00000010; 

p[5]->word[10] - 0x00002000; 
break; 

} 

default: 
break; 

} 

} 

/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ |= MP_PKTREQ__P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002001); /* send msg interrupt to PP0 */ 

#endif 

if ( (i—0)&(lp—l)) 

* time_colin = Oxffffffff - TCOUNT; 

p[4]->word[l] = p[4]->word[1] + jump_col[lp-1] ; /* Src address is DRAM 

*/ 

/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002002); /* send msg interrupt to PP1 */ 

#endif 

p[4]->word[1] = p[4]->word[1] + jump_col[lp-1]; /* Src address is DRAM 

*/ 
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/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i - i + 1; i - i - 1; while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002004); /* send msg interrupt to PP2 */ 

#endif 

p[4]->word[1] = p [4]->word[1] + jump_col[lp-1]; /* Src address is DRAM 

*/ 

/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ != MP_PKTREQ_P_BIT; i=i+l;i=i-l; while(PKTREQ & 0x02); 

tifndef DEBUG 

command(0x00002008); /* send msg interrupt to PP3 */ 

#endif 


p[4]->word[1] = 

*/ 

p [ 4]->word[1] + 

jump_col[lp-1 ] ; /* Src address is DRAM 

p[5]->word[l] 

= 0x00000000; 


/* Src address is internal */ 

*ptr = (long) 

P [ 5 ]; 



#ifndef DEBUG 

while((INTPEN & 0x10000)==0x00) ; 
INTPEN = 0x10000; 

/* 

/* 

poll PP0 */ 

clear the interrupt flag */ 


#endif 


/* kick off transfer from internal to DRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i - i + 1; i - i - 1; while(PKTREQ & 0x02); 


if (i==0) 

time_colout = Oxffffffff - TCOUNT; 

p[5]->word[2] - p[5]->word[2] + jump_col[lp-1];/* Dst address is DRAM 


#ifndef DEBUG 

while((INTPEN & 0x20000)==0x00); /* poll PP1 */ 

INTPEN = 0x20000; /* clear the interrupt flag */ 

#endif 

/* kick off transfer from internal to DRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

p[5]->word[2] = p[5]->word[2] + jump_col[lp-1];/* Dst address is DRAM 


#ifndef DEBUG 

while((INTPEN & 0x40000)==0x00); /* poll PP2 */ 

INTPEN = 0x40000; /* clear .the interrupt flag */ 

#endif 

/* kick off transfer from internal to DRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i-i+l;i=i-l; while(PKTREQ & 0x02); 

p[5]->word[2] = p[5]->word[2] + jump_col[lp-13;/* Dst address is DRAM 
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tifndef DEBUG 

while((INTPEN & 0x80000)==0x00); /* poll PP3 */ 

INTPEN = 0x80000; /* clear the interrupt flag */ 

#endif 

/* kick off transfer from internal to DRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

p[5]->word[2] = p [5]->word[2] + jump_col[lp-1];/* Dst address is DRAM 

*/ 

} /* end for i */ 

} /* end for Ip */ 

last_dtime = decode_time; 
decode_time = decode_time - TCOUNT; 

times = junk_time; 
junk_time = TCOUNT; 

/* DRAM (8 bit data) -> VRAM (raw image) */ 

*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 

/* times = times + 1; 
if (times == 10) times =0; */ 

#ifdef SHOW__WAITFORHOST 
/* sprintf(stringl,'^d",times); */ 
sprintf(stringl,"%f M , (times*0.00002) ); 

for (i=0;i<4;i++) 

{ 

if (stringl[i]!= 1 . 1 ) 

{ 

temp_str - stringl[i]; 
temp_str = temp_str - 48; 
temp_str = tempostr * 16; 

} 

else 

{ 

temp_str = 160; 

} 

p[2]->word[1] = (long)&number_pixels; 
p[2]“>word[1] += temp_str; 


p[2]->link = p[2]; 
p[2]->word[0] = 0x80000000; 
p[2]->word[1] = 0x80300000; 
p[2]->word[2] = 0xb4000240; 
p[2]->word[3] = 0x00110010; 
p[2]->word[4] = 0x00110010; 
p[2]->word[5] = 0x00; 
p[2]->word[6] = 0; 
p[2]->word[7] = OxbO; 
p[2]->word[8] = 0x280; 
p[2]->word[9] = 0x00; 
p[2]->word[10] = 0x0000; 
p[2]->word[11] = 0; 
p[2]->word[12] = 0; 
p[2]->word[13] = 0; 
p[2]~>word[14] = 0; 


/* point to next PT 

I* linear to VRAM 

/* Src address is DRAM 

/* Dst address is VRAM 

/* Src B count Src A c'ount 

/* Dst B count Dst A count 

/* Src C count 

/* Dst C count 

/* Src B pitch 

/* Dst B pitch 

/* Src C pitch 

/* Dst C pitch 

/* Src transparency upper 

/* Src transparency lower 

/* Reserved 

/* Reserved 
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p[2]->word[2] = 0xb4042b00 + i*16; 

*ptr = (long) p[2]/ 

PKTREQ [= MP__PKTREQ_P_BIT; /* start xfer from internal to VRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

} 

#endif 

#ifdef SHOW_DECODETIME 

/*temp__time = 1.0/(decpde_time*0.00000002) ; */ 

temp_time = (decode_time*0.00002) ; /* time in milliseconds */ 

sprintf(stringl,"%f",temp_time); 

for (i=0;i<4;i++) 

{ 

if (stringl[i]!= f . 1 ) 

{ 

temp_str = stringl[i]; 
temp_str = tempostr - 48; 
temp_str = tempostr * 16; 

} 

else 

{ 

temp_str = 160; 

} 

p[2]->word[1] = (long)&number_pixels; 
p[2]->word[1] += temp_str; 
p[2]->word[2] = 0xb4045580 + i*16; 

*ptr = (long) p[2]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to VRAM */ 

i - i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

} 

#endif 

#ifdef SHOW_DSPUTIL 

/*temp_time = 1.0/(decode_time*0.00000002) ; */ 

temp_time = (decode_time* 100.0)/((decode_time + time_in_wait)* 1.0); /* 

percentage of time spend processing */ 

sprintf(stringl,"%f",temp_time) ; 

for (i=0;i<4;i++) 

{ 

if (stringl[i] ! =' . ') 

{ 

temp_str = stringl[i]; 
temp_str = temp_str - 48; 
tempostr = temp_str * 16; 
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} 

else 

{ 

temp_str = 160; 

} 

p[2]->word[1] = (long)&number_pixels; 
p[2]->word[1] += temp_str; 
p[2]->word[2] = 0xb4005e80 + i*16; 

*ptr = (long) p[2]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to VRAM */ 

i = i + 1; 
i « i - 1; 

while(PKTREQ & 0x02); 

} 

#endif 

#ifdef S H 0W__ FRAME RAT E 

sprintf(string1,"%f", 1000.0/frame_time) ; 

for (i=0;i<4;i++) 

{ 

if (stringl [i]! = *.') 

{ 

temp_str = stringl [i]; 
temp_str = temp_str - 48; 
temp_str - temp_str * 16; 

} 

else 

{ 

temp_str = 160; 

} 

p[2]->word[1] » (long)&number_pixels; 
p[2]->word[1] += temp_str; 
p [2] ->word[2] = 0xb4010b00 + i*16; 

*ptr = (long) p[2]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to VRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

} 

#endif 

#ifdef SH0W__ FRAME TIME 

sprintf(stringl,”%f",frame_time); 

for (i=0;i<4;i++) 

{ 

if (stringl[i]!='. f ) 

{ 
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temp_str = stringl[i]; 
temp_str = temp_str - 48/ 
temp_str = temp_str * 16; 

} 

else 

{ 

temp_str =160; 

} 

p [2]->word[l] = (long)Snumberjpixels; 
p [2]->word[1] += temp_str; 
p[2]->word[2] = 0xb4048000 + i*16; 

*ptr = (long) p[2] ; 

PKTREQ 1= MP_PKTREQ_P_BIT; /* start xfer from internal to VRAM */ 

i * i + 1; 
i = i - 1; 


while(PKTREQ & 0x02); 

} 


#endif 

/* DRAM (8 bit data) -> VRAM (raw image) 

p[2]->link = p[2]; 

V 

/* 

point to next PT 

*/ 

p [2]->word[0] = 0x80000000; 

/* 

linear to VRAM 

*/ 

p [2]->word[1] = 0x80300000; 

/* 

Src 

address is DRAM 

*/ 

p[2]->word[2] = 0xb4000240; 

/* 

Dst 

address is VRAM 

*/ 

p [2]->word[3] = 0x00038000; 

/* 

Src 

B count Src A count 

*/ 

p [2]”>word[4] = 0x00ff0200; 

/* 

Dst 

B count Dst A count 

*/ 

p [2]->word[5] = 0x00; 

/* 

Src 

C count 

*/ 

p [2]->word[6] = 0; 

/* 

Dst 

C count 

*/ 

p [2]->word[7] = 0x8000; 

/* 

Src 

B pitch 

*/ 

p[2]->word[8) = 0x800; 

/* 

Dst 

B pitch 

*/ 

p [2]->word[9] = 0x00; 

/* 

Src 

C pitch 

* / 

p [2]->word[10] = 0x0000; 

/* 

Dst 

C pitch 

*/ 

p [2]->word[11] = 0; 

/* 

Src 

transparency upper 

*/ 

p [2]->word[12] = 0; 

/* 

Src 

transparency lower 

*/ 

p [2]~>word[13] = 0; 

/* 

Reserved 

*/ 

p [2]->word[14] = 0; 

1* 

Reserved 

*/ 


command(Ox0000200F); /* send msg interrupt to PP1 */ 


#endif 

/* ptr_tail = ptr_tail + 1; 

if (ptr_tail == d_buffers) ptr_tail =0; */ 

#ifdef dumb_HOST 
/* 

** Tell host this frame is completed decoding 
*/ 

ReturnVal = CilRaiseSignalNumber(C80DoneBit); 
if (ReturnVal != CIL_OK) 

{ 

/*** Cannot raise signal ***/ 
while(1); 
return; 
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} 

#endif 

junk_time = junk_time - TCOUNT; 

} /* end while */ 

} /* end task */ 

extern int ep_runpp0; 

main {) 

{ 

int i; 

unsigned int temp; 

unsigned int *src_ptr = (unsigned int *)0x90018000; 
unsigned int *dst_ptr = (unsigned int *)0x80000000; 

/* REFCNTL = Oxffff 0138; */ /* setup up dram and sdram to correct refresh rate 
for 40 Mhz C80*/ 

REFCNTL = 0xffff0186; /* setup up dram and sdram to correct refresh rate for 50 
Mhz C80*/ 


command(OxcOOOOOOf) ; /* reset and halt PP0,1,2,3 */ 

Mint *)0x010001b8 = (int)&ep_runpp0; /* initialize task vector */ 

* (int *)0x010011b8 = (int)&ep_runpp0; /* initialize task vector */ 

Mint *)0x010021b8 = (int)&ep_runpp0; /* initialize task vector */ 

* (int *) 0x010031b8 = (int)&ep_runpp0; /* initialize task vector */ 


/* upload PP code */ 

/* memcpy( 0x80000000, 0x90200000, 0x9020bae0 - 0x90200000); */ 

for (i=0;i<20000;i++) 

{ 

temp = *src_ptr++; 

*dst__ptr++ = temp; 

} 

for (i=0;i<20000;i++) 

{ 

temp = temp + 1; 
temp = temp -1; 

} 

command(0x3000000f); /* start PP0,1,2,3 by unhalting it */ 

/* all will take its task interrupt*/ 


/* Basic init functions */ 

#ifdef HOST 

Interrupting () ; /* Init ME interrupts */ 

#endif 


PtReqlnit () ; 
TasklnitTasking(); 
IclInstallPtdMalloc0; 
IclPTInit(15); 


/* Init the ME PT functions */ 

/* Init tasking */ 

/* Install protected malloc and free function to ME */ 
/* Init the Icl PT server task with a priority of 15 
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#ifdef HOST 
/* 

** Initialise the Cil 

** Declare 4 buffers of 256 bytes each. 

** These buffers are not used here - choose minimum sizes. 

*/ 

Cillnit(4,256); 

#endif 

TaskResume(TaskCreate(-1,task, NULL, 14, 4096)); /* Start task */ 

while (1—1); /* loop */ 

} 


/a****************************************************************** 
* * 


* Function : SignalHandler * 

* Args : UINT32 Signals * 

* * 


* Description: 

* Signals Signals raised by host 

* 


★ 

* 

* 


* 

* 


SignalHandler will be called when host raises signal * 

* 


* Return Values: 

* None 

* 


★ 

★ 

* 




void 

SignalHandler(UINT32 Signals) 

{ 

if ((Signals & HostRequestBitMask) !=0) 

{ 

/* 

** Read where shared data can be found 
*/ 

ReturnVal - CilReadMailbox(0,(PUINT32)ping_pong_addr); 

/* ReturnVal = CilReadMailbox(1, (PUINT32)changemod); */ 

I* buf_addr[ptr_head] = *ping_pong_addr; */ 

if (ReturnVal !- CIL_OK) 

{ 

/*** Cannot read from mailbox ***/ 
while (1); 

} 

/* ptr_head = ptr_head + 1; 

if (ptr_head == d_buffers) ptr_head = 0; */ 

} 

} 

/* Initialize the adaptive bit allocation tables on the PPs */ 
void 

init_alloc_table(UINT32 index) 

{ 

int t alloc; 
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int p_alloc; 
int r_alloc; 
int k,lp; 

t_alloc - YSIZE*XSIZE/comp_table[index] 
p_alloc - t_alloc/((AY*AX)/6); 
r_alloc = t_alloc%((AY*AX)/6); 

for(k=0;k<{(AY*AX)/6);k++) 
local_alloc[kj = p_alloc; 


Ip = 0; 

while (r_alloc>0) 

{ 

local_alloc[lp++] ++; 
r_alloc—; 

} 


void 

init_alloc_table2(UINT32 index) 

{ 

int t_alloc; 
int p_alloc; 
int r__alloc; 
int k,lp; 

t_alloc = index - headersize; 
p_alloc = t_alloc/((AY*AX)/6); 
r_alloc = t_alloc%((AY*AX)/6); 

for(k=0;k<((AY*AX)/6);k++) 
local_alloc[k] = p_alloc; 


lp - 0; 

while (r_alloc>0) 

{ 

local_alloc[lp++] ++; 
r_alloc--; 

} 


} 


headersize; 
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/******★******************************************************************* 
* Filename : pcic80.cmd * 


★ * 

* Description: * 

* * 

* PCI/C80 linker file * 

* * 




-heap 0x100000 
-stack 0x10000 
-1 mp_cio.lib 

-1 \pcic80\lib\mp_task.lib 
-1 mp_int.lib 
-1 mp_rts.lib 
-1 mp_ptreq.lib 
-1 mpjppcmd.lib 
-1 ppcmd.lib 
-1 \pcic80\lib\icl.lib 
-1 \pcic80\lib\vol.lib 
-1 \pcic80\lib\bgl.lib 
-1 \pcic80\lib\vil24.lib 
-1 \pcic80\lib\cil.lib 

MEMORY 

{ 

RAMO : o=0x00000000 1 - 0x00800 

RAMI: o=0x00000800 1 - 0x00800 

RAM2 : o=0x00008000 1 = 0x00800 

RESERV : o=0x01000000 1 = 0x00200 

MPPRAM : o=0x010007D8 1 - 0x00028 

SDRAM : o=0x80000000 1 = 0x300000 

DRAM: o=0x90000000 1 = 0x18000 

DRAM2 : o=0x90Q18000 1 = 0xle8000 

UNINIT : o=0x90200000 1 - 0x200000 

IMAGE : o=0x80300000 1 * 0x80000 

SPOT : o=0x80380000 1 = 0x10000 

VRAM_PAL : o=0xB0000000 1 = 0x200000 

VRAM_VGA : o=0xB4000000 1 = 0x400000 

} 

SECTIONS 

{ 

/* 

* The following section must be defined for all program that 

* use the CIL. The section must appear in the first 8Mb 

* of DRAM and must be long enough to include all buffers 

* plus 128 bytes. This example is big enough for 4*256byte 

* buffers. 

* 

* See the user guide for more information. 

*/ 

.lsidram : { 

_CilDRAMBase = .; 

. += 0x600; 

} > DRAM 


. text 

> DRAM 

.cinit 

> DRAM 

.const 

> DRAM 

.switch 

> DRAM 

.data 

> DRAM 
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.bss : > DRAM 

.cio : > DRAM 

.pcinit : > DRAM 

.ptext : load > DRAM2, run SDRAM 

font : load > DRAM2, run SDRAM 

.sysmem : > UNINIT 

.stack : > UNINIT 

sh_vars : > MPPRAM 

rawimage: > IMAGE 

stream: > SPOT 


/★it************************************************************************ 

* End of file pcic80.cmd 
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* Filename : pcic80a.cmd * 


* * 

* Description: * 

* * 

* PCI/C80 linker file * 

* * 


**************************************************************************/ 

-pc 

-1 d:\ravp\src\newlib\pp_rts.lib - 

-pstack 256 

MEMORY 

{ 

RAMO : o=0x00000000 1 « 0x00800 

RAMI : o=0x00000800 1 = 0x00800 

RAM2 : o=0x00008000 1 = 0x00800 

RESERV : o=0x01000000 1 = 0x00200 

PRAM: o=0x01000200 1 = 0x00600 

SDRAM : o=0x80000000 1 = 0x800000 

DRAM: o=0x90400000 1 = 0x400000 

VRAM_PAL : o=0xB0000000 1 = 0x200000 

VRAM VGA : o=0xB4000000 1 = 0x400000 


SECTIONS 

{ 


text 

> DRAM 


ptext 

> DRAM 


cinit 

> DRAM 


const 

> DRAM 


switch 

> DRAM 


data 

> DRAM 


bss : > DRAM 


cio : > DRAM 


sysmem 

> DRAM 


stack 

> DRAM 


pcinit 

> DRAM 


pbss : 

(PASS) > 

PRAM 

psysmem: 

(PASS) > 

PRAM 

pstack : 

(PASS) > 

PRAM 


> 
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* * 

* Function : $quick_syms * 

* Args : none * 

* Passed in : * 

* * 

* Description: * 

* * 


* 

★ 

★ 


$quick_syms will be called to calculate the number of * 

symbols needed for this pass thru the refinement pass. * 

* 


* Return Values: 

* None 

* 


★ 

* 

* 


'k + 'kii-kic'k-k'k'ic'k'k'tr'k'k-k'k-k'k-k-k-k'k’k'i'i'le-kie'k'k'k + 'k'tf'k-ir'lririr-k-kir-k'k-kie'kltir-k-ki'ir-k-kit-k-kic-k-kiritif-k-k 


/ 


stats_flag .setaO 
save_array .setal 
kids list .seta8 


stats_flag_base .seta9 
i .setxO 


index 

.setx9 

std ,setx2 

tempdl 

.setdl 

tempd2 

.setd2 

tempd3 

.setd3 

tempd4 

.setd4 

.global 

$quick_syms 

.global 

$stats_flag 

.global 

$save_array 

.global 

$save_index 

.global 

$syms_to_do 

.global 

$read more 

.global 

$left_off 

.global 

$link_list 

.global 

$total_links 

.align 

128 


$quick_syms: 

stats_flag_base =uw * (pba + $stats_flag) 
save_array =uw *(pba + $save_array) 
kids_list - &*(pba + $link_list) 
std = 0 

tempd4 = 64 

tempd2 =uh *(pba + $left_off) 
tempdl = 256 - tempd2 

tempd3 =ub *(pba + $read_more) 
tempd3 = tempd3 - 0 
tempd2 =[eq.z] tempd4 
tempdl =[eq] 192 

i = tempd2 
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leO = endquick 

IrsO - tempdl - 1 

*(pba + $syms_to_do) =uh al5 

* (pba + $save_index) =uh al5 

tempdl =ub * (pba + $total__links) 

lei = incr 
lrsl = tempdl - 1 
index = 0 
tempd2 = 251 

tempdl =ub *(kids_list + index) 
tempd3 = tempdl<<8 

stats_flag = stats_flag_base + tempd3 
index = index + 1 

tempdl ^ub *(stats_flag + i) 
a!5 = tempdl & 5 
•tempdl =[ne.z] tempdl & tempd2 
br =[ne] incr 

* (stats__flag + i) =ub tempdl 
tempdl = i + tempd3 

*(save_array +[std]) =uh tempdl 

al5 - std - 240 
IcO =[gt] 0 
std = std + 1 

incr: 

nop 

endquick: 

i = i + 1 

*(pba + $syms_to_do) ~uh std 

* (pba + $left_off) =uh i 
tempdl - 0 

al5 - i - 256 
tempdl -[ne] 1 

end_quick_syms: 
br = iprs 

*(pba + $read_more) =ub tempdl 
nop 
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INTERFRAME ENCODING 

/Written by Jim Witham, 3-27-97 
; addconv.s 

; To replace the following C Code that just took too much time... 
; for(i-0;i<2048;i++) 

; { 


'• 

img[i] = img[i] 

+ oldmean 


; 

/ 

f 

/ 

pic[i] - img[i] 
if (img[i]>255) 
pic[i] = 255; 
if (img[i]<0) 
pic[i] - 0; 

} 



/ 

.global $add_n_conv8 



$add_ 

n conv8: 




dl = dl«3 
d6 =rh dl 

;dl is mean that is passed in 
/replicate it to both words 



sr - Oxad 
xO = 1 
xl - 2 




dO - 0 

d4 = 0x08000800 

; 0800»3 = 0100 



d5 - 0x07f807f8 
a2 = &*(dba + 1) 

; 07f8»3 = OOff 



le2 - L48 
lrs2 - 1023 




al = (dba) 

aO = &*(dba) 




dl = *(al++-[xO]) 
dl =me dl + d6 
d3 - (dO & @mf) | 

(dl & -@mf) /if (val-mean<0) then d3 = 0 

else d3 

dl 

d2 -me dl - d4 
d3 = (d3 & @mf) | 

(d5 & -@mf) /if (val-mean>0x800) then d3 

= 0x7 f8 

else 

d3 - d3 




d3 = d3 »u 3 

; shift down to proper scaling 



*(a2++-[xl]) -b d3 
i| d2 -hi d3 

L48: 

*(a0++=[xl]) =b d2 

sr - 0x36 

br = iprs 
nop 
nop 
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/* DECLARATIONS FOR ARITHMETIC CODING AND DECODING */ 

/* Size of arithmetic code values */ 

#define Code_value_bits 9 /* Number of bits in a code value */ 

typedef short code_value; /* Type of arithmetic code value */ 

#define Top_value ( ( (long) l«Code_value_bits)-1) /* Largest code val */ 

/* Half and Quarter points in code value range */ 

#define First_qtr (Top_value/4+l) /* Points after first quarter */ 

#define Half (2*First__qtr) /* Points after first half */ 

#define Third_qtr (3*First_qtr) /* Points after third quarter */ 
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/a**************************************************************************** 

* * 

** bitebits.s (PP Program) 

* * 

** Written by Jim Witham, Code 472300D, 939-3599 
★ * 

** File that contains the following assembly language subroutines: 

★ ★ 

** $new_dbits_wo 
** $new_dbits_nwo 
** $new_dbits2 
** $do_syms 
** $encode_symbol 
** I_DIV_JW 
** $update_model 
** $bit j?lus_follow 

** $new_output_bit 
* ★ 

★★★a*************************************************************************/ 
.global $new_dbits_wo 
.global $new_dbits_nwo 

.global $new_dbits2 

.global $output_bit 
.global $do__syms 

.global $T_BYTES 
. global $byte_stream 
.global $STOP 

.global $bit_index 

.global $ztl ;signed short pointer sh * (xba + $ztl) 

.global $THRESH /signed int sw * (xba + $THRESH) 

.global $stats_flag /unsigned char pointer *(xba + $stats_flag) 

.global $char_to__index /unsigned char pointer &*(xba + 

$char_to_index) 

.global $stats_val /signed short pointer *(xba + $stats_val) 

• global $TMASK /unsigned short uh * (xba + $TMASK) 

.global $BITE /signed int sw *(xba + $BITE) 

.global $SHFT_DN /signed int sw *(xba + $SHFT_DN) 

.global $update_model 
.global $encode_symbol 

.global $new_output_bit 

.global $time_cdwo 
.global $time_cdnwo 
.global $time_cd2wo 
.global $time_sym 

.global $list 
.global $list_index 

.global $pruned_children 

.global $getaway_address 
.global $quick_getaway 

.global $qstats_val 
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.global $qindex 
.global $qcount 


tempdl 
stats_flag 
stats val 


. set dO 
.setdl 
.setd2 


tempd2 
tflg 
tmas k 
tempd3 
t 

sym 


.setd3 
. set d4 

.setd5 
.setd6 
. set d7 
.seta!2 


maxsyms 


. set 63 


.align 512 


/★a***************************************************************** 
* * 

* Function : $new_dbits_wo * 

* Args : m ,cl,c3,s * 

* Passed in :(dl,d2,d3,d4) * 

* * 

* Description: * 

* m - the current index of this coefficient * 

* * 


* cl - the relative index of the first child * 

* * 

* c3 - the relative index of the third child * 

* * 

* s - the subblock that we are in * 

* * 


$new_dbits_wo will be called to scan a coefficient that is * 
on the second or third tier of the tree. The coefficient* 
has already been checked previously for being significant* 
on a previous bit-plane, now it is check against the * 

current bit-plane threshold for significance, or pass * 


* down. * 

* * 

* Return Values: * 

* None * 

* * 




$new dbits wo: 


dO = & * (sp —= 28) 

*(sp + 16) =w iprs 

* (sp + 12) =w a4 

II *(sp + 8) =w d6 

* (sp + 4) =w al2 

II *(sp + 0) =w d7 

*(sp + 24) = d4 


d5 

= dl 



al 

=uw *(xba 

+ 

$stats__val) 

a2 

=uw *(xba 

+ 

$stats flag) 

a3 

=uw *(xba 

+ 

$ztl) 


;save s onto stack 

;*(al) « stats_val[s*256] 
;*(a3 + [xO]) - ztl[J 
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512) 


256) 


d6 

= 

d4 

« 9 

al 

- 

al 

+ d6 

d6 

= 

d4 

«8 

a2 

= 

a2 

+ d6 

d6 

- 

d4 

« 7 

a3 

= 

a3 

+ d6 

d5 

= 

d5 

+ (d4«8) 

* (: 

sp 

+ ; 

20) =uh d5 

aO 

= 

a2 


xO 

- 

dl 


xl 


d3 


x2 

- 

d3 

+ 1 

a2 

= 

a2 

+ d2 


; 256 elements each 2 bytes long times s (<< 9 


; 256 elements each 1 byte long times s (<< 8 


;ztl[s*64] (where ztl [] is a short) 

;m = m + s*256 


; * (a 0) 


stats__f lag [s*256] 


;x0 = m 
;Ma2 + [xl]) 
;Ma2 + [x2] ) 


stats_flag[c3] 
stats_flag[c4 » c3 + 1] 


;Ma2) 

;Ma2 + [1]) 


= stats_flag[s*256 + cl] 
= stats_flag[c2 = cl + 1] 


;*(al + 

[xO]) 

- 

stats val[s*256 + m] 


;*(aO + 

[xO]) 

= 

stats_flag[s*256 

+ m] 


;Ma2) 



stats__f lag[s*256 

+ cl] 


;Ma2 + 

[1]) 


stats_flag[s*256 

+ c2] 

(c2 = cl + 1) 

; * (a2 + 

[xl]) 

= 

stats_flag[s*256 

+ c3] 


; * (a2 + 

[x2] ) 

= 

stats_flag[s*256 

+ c4] 

(c4 - = c3 + 1) 


tmask =uh *(xba + $TMASK) 


tflg =sh *(a3 + [xO]) 

al5 = tflg&tmask 

tflg =111 tflg =[ne] al5 

tflg = tflg - 0 
br -[z] no_change 
stats_val =sh *(al + [xO]) 
tempd3 = |stats_val[ 


/fetch ztl[m + s*64] 
/calculate tflg 

/fetch stats_val[m] 
/take abs(stats_val[m]) 


tempdl =sw *(xba + $time_cdwo) 
tempdl = tempdl + 1 
*(xba + $time_cdwo) =w tempdl 

tempdl = tflg << 1 
a4 = tflg 

call = mod_flags 

d4 = *(sp + 24) /reload s from stack into d4 

tempd2 =ub *(a2) 

tflg = a4 


no_change: 

t = tempd3 & tmask /form TMASK&abs(stats_val[m]) 

tempdl "sw *(xba + $SHFT_DN) /fetch SHFT_DN 

tempdl = -tempdl 

t = t »u -tempdl /form (TMASK&abs (stats__val [m] ) ) »SHFT_DN 
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/compute sym 

tempdl “Sw *(xba + $BITE) 
tempdl = %tempdl 
t - t - 0 
tempdl =[eq] al5 
stats_val = stats_val - 0 
tempdl =[le] al5 

finish_up: 

tempd2 = (-tflg)&1 
tempd2 = tempd2 + t 
tempd2 = tempd2 + tempdl 

xl = tempd2 

t = t - 0 
br = [le] L68 
nop 
nop 

;if (t>0) 

; stats_val [m] = (abs (stats_va1 [m] )-( (t* THRESH)/(1<<(BITE- 

1) ) + THRESH/ (2« (BITE-1) ) ) ) / 

;/* or */ 

; stats_jval [m] = (abs (stats_val [m] ) - ( (t*THRESH) >> (BITE-1) ) + (THRESH»BITE) ) ; 

tempdl =sw *(xba + $BITE) 
tempd2 =uh *(xba + $THRESH) 

tflg = 1 - tempdl 

tmask =u tempd2 * t 
tempdl = - tempdl 

al - *(pba + $qstats_val) 

a2 = *(pba + $qindex) 

x2 =ub *(pba + $qcount) 

tempd3 = tempd3 - (tmask >>u -tflg) 
tempd3 = tempd3 - (tempd2 >>u -tempdl) 

*(al + [x2]) -h tempd3 
tempdl =uh *{sp + 20) 

Ma2 + [x2] ) =h tempdl 
tempdl » x2 + 1 
*(pba + $qcount) =ub tempdl 


L68: 

tempdl = 0 
t = t - 0 

tempdl =[nz] 4 /calculate (t!=0)<<2 

* (aO + [xO]) =ub tempdl 

al5 = tempdl - 0 

br =[eqj nosig 

xO =uh * (pba + $list_index) 

al5 = xO - 254 

br =[ge] nosig 

tempdl =uh *(sp + 20) 
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aO =uw *(pba + $list) 

tempd2 = xO + 1 

MaO + [xO]) =uh tempdl 

* (pba + $list_index) =uh tempd2 

nosig: 

aO = &* (pba + $sym__array) 
xO =uh *{pba + $sym_index) 

nop 

tempd2 - xO + 1 

*(pba + $sym_index) =uh tempd2 

;s ym_ arr ay[sym_ind e x++] = sym; 

*(aO +[xO]) =ub xl 

; if (sym_jLndex>63) do_syms(); 

tempdl = tempd2 - maxsyms 

call =[gt] $do_syms 

nop 

nop 

done_more: 

a4 = *(sp + 12) 

al2 = *(sp + 4) 
br = *(sp + 16) 

d6 =sw *(sp + 8) 

|| d7 =sw *(sp + 0) 
dO = &* (sp ++= 28) 


/***************************************************************** 

★ 

* Function : $new_dbits_nwo 

* Args : m ,01,03,s 

* Passed in :(dl,d2,d3,d4) 

* 

* Description: 

* m - the current index of this coefficient 

* 

* cl - the relative index of the first child 
+ 

* c3 - the relative index of the third child 

* 

* s~ the subblock that we are in 

* 

* $new_dbits_nwo will be called when the current coefficient 

* and all of its children have been deemed insignificant, 

* from the pass down flag. 


Return Values: 
None 
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$new dbits nwo: 


a2 

= 

uw 

* (xba + 

$stats_flag) 




d5 

= 

d4 

« 8 





a2 

= 

a2 

+ d5 





aO 

= 

a2 


; * (aO 

+ 

txO]) 

= stats_flag[s*256 

xO 

= 

dl 


;x0 = 

m 



xl 


d3 


;* (a2 

+ 

[xl]) 

= stats_flag[s*256 + c3] 

x2 

= 

d3 

+ 1 

;* (a2 

+ 

[x2] ) 

= stats_flag[c4 = c3 + 1] 

a2 

= 

a2 

+ d2 

;* (a2) 



= stats_flag[s*256 + cl] 


;*(a2 + [1]) = stats_flag[c2 = cl + 1] 


; tempd2 =ub *(aO + [xO]) 

; tempd2 = tempd2 & 252 

; MaO + [xO]) -ub tempd2 

; tempdl =sw * (xba + $time__cdnwo) 

; tempdl = tempdl + 1 

; *(xba + $time_cdnwo) =w tempdl 

tempdl = 2 
tempd2 =ub *(a2) 

mod_flags: 

tempd2 = tempd2 | tempdl 
*(a2) =ub tempd2 

tempd2 =ub *(a2 + [1]) 
tempd2 - tempd2 | tempdl 
Ma2 + [1]) =ub tempd2 

tempd2 =ub * (a2 + [xl]) 
tempd2 = tempd2 | tempdl 
*(a2 + [xl]) =ub tempd2 

tempd2 =ub *(a2 + [x2]) 

tempd2 = tempd2 | tempdl 
*(a2 + [x2]) =ub tempd2 

a2 = &*(pba + $pruned_children) 

x2 = d4 ;s is in d4 

nop 

tempdl =ub *(a2 + x2) 
br = iprs 

tempdl - tempdl + 1 
*(a2 + x2) =ub tempdl 

.align 512 
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* * 

* Function : $new_dbits2 * 

* Args : m,s * 

* Passed in :dl,d2 * 

* * 


* 

* 

* 


* 

* 

■* 


Description: * 

$new_dbits2 will be called to scan the fourth tier in the * 

zero tree. * 

* 


Return Values: * * 

None * 

* 




$new dbits2: 


dO = &*(sp --= 24) 

*(sp + 16) —w iprs 

*(sp + 8) =w a4 
*(sp + 4) =w d6 

*(sp + 0) -w d7 

aO =uw * (xba + $stats_flag) 

al =uw * (xba + $stats_val) ;*(al) *= stats_val [s*256] 

d5 = dl 

d3 - d2 « 9 
al = al + d3 

d3 = d2 « 8 
aO = aO + d3 

d5 « d5 + (d2 « 8) 

*(sp + 20) =uh d5 ;index - m + s*256 

xO = dl ;x0 = m 

/*(aO + [xO]) - stats_flag[s*768 + m] 

;*(al + [xO]) = stats_val[s*768 + m] 

tmask =uh *(xba + $TMASK) 

stats_val =sh *(al + [xO]) /fetch stats_val[m] 

tempd3 = |stats_val| /take abs (stats_val[m]) 

t = tempd3 & tmask /form TMASK&abs(stats_val[m]) 

tempdl =sw * (xba + $time_cd2wo) 
tempdl - tempdl + 1 
*(xba + $time_cd2wo) =w tempdl 

tempdl =sw *(xba + $SHFT_DN) /fetch SHFT_DN 

tempdl = -tempdl 

t = t >>u -tempdl /form (TMASK&abs (stats_val [m] ) ) »SHFT_DN 
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/compute sym 

tempd2 =sw * (xba + $BITE} 
tempd2 - %tempd2 

stats_val = stats_val - 0 
tempd2 =[le] al5 

t = t - 0 
tempd2 =g[eq] al5 

tempd2 =[ne] tempd2 + 1 /form (t!=0) and add to (1«BITE)-1 

t = t - 0 
br =[le] L69 

xl = tempd2 + t /sym = t + (((t!=0)&&(stats_val[m]>0)) 

( (1«BITE)-1) : 0) + (t! =0) 

a4 = &*(xba + $char_to_index) 

/if (t>0) 

; s tats_val[m] = (abs(stats_val[m])-((t*THRESH)/(l<<(BITE 

1) )+THRESH/(2« (BITE-1) ) ) ) / 

//* or */ 

/ stats_val[m] = (abs (stats_val [m] ) - ( (t*THRESH) » (BITE-1) ) + (THRESH»BITE) ) / 

tempdl —sw *(xba.+ $BITE) 
tempd2 =uh *(xba + $THRESH) 

tflg = 1 - tempdl 

tmask =u tempd2 * t 
tempdl - - tempdl 

al - *(pba + $qstats_val) 
a2 = *(pba + $qindex) 
x2 =ub *(pba + $qcount) 

tempd3 = tempd3 - (tmask >>u -tflg) 
tempd3 = tempd3 - (tempd2 »u -tempdl) 

*(al + [x2]) -h tempd3 
tempdl =uh *(sp + 20) 

* (a2 + [x2]) =h tempdl 
tempdl = x2 + 1 
*(pba + $qcount) ~ub tempdl 


L69: 

tempdl = 0 
t = t - 0 

tempdl -[nz] 4 /calculate (t!=0)<<2 

*(aO + [xO]) =ub tempdl 

al5 = tempdl - 0 

br =[eq] nosig2 

xO =uh *(pba + $list_index) 

al5 = xO - 254 

br =[ge] nosig2 

tempdl =uh *(sp + 20) 

aO =uw *(pba + $list) 

tempd2 = xO + 1 

*(a0 + [xO]) =uh tempdl 
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* (pba + $list__index) =uh tempd2 
nosig2: 

aO = &*(pba + $sym_array) 
xO =uh *(pba + $sym_index) 

tempd2 = xO + 1 

*(pba + $sym_index) =uh tempd2 
;sym_array[sym_index++] = sym; 

* (aO +[xO]) =ub xl 

; if (sym_index>63) do^syms(); 

tempdl = tempd2 - maxsyms 

call =[gt] $do_syms 

nop 

nop 

a4 =sw * (sp + 8) 

br = *(sp + 16) 

d6 =sw * (sp + 4) 

| j d7 =sw * (sp + 0) 
dO = &*(sp ++= 24) 
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global 

$index_to_char 

global 

$No_of_symbols 

global 

$bits_to_follow 

global 

$high 

global 

I_DIV 

global 

$f req 

global 

$low 

global 

$cum_freq 

global 

$bit_plus_follow 


.global $sym_array 
.global $sym_index 

.align 512 

/******************************************************************* 
* * 

* Function : do_syms * 

* Args : none * 

* * 

* Description: * 

* do_syms will be called to empty the symbol cache. * 


* Return Values: * 

* None * 

* * 
I*******************************************************************/ 


$do_syms: 

dO = &* (sp —= 20) 

*(sp + 16) =w iprs 

*(sp + 12) -w a4 

I I *(sp + 8) =w d6 

*(sp + 4) =w al2 

II *(sp + 0) =w d7 

al2 = &* (xba + $char_to_index) 
a4 = &* (xba + $sym_array) 

d6 =uh *(xba + $sym_index) 
x8 =ub *a4++ 


more_syms: 

call = $encodersymbol 
d7 ~ub * (al2 + x8) 
dl = d7 


call = $update_model 

nop 

dl - d7 

dl =sw *(xba + $time_sym) 
dl = dl + 1 

*(xba + $time_sym) =w dl 
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dl =sw *{xba + $STOP) 
dl *= dl - 1 
br =[eq] get_out 
x8 =ub *a4 + + 
d6 = d6 - 1 

br = [ gt ] more_syms 

get_out: 

aO = *(pba + $qstats_val) 
al = *(pba + $qindex) 

dl -uh *(xba + $sym_index) 
a4 ~ &* (xba + $sym_array) 

I| d6 = dl - d6 

a2 = * (pba + $stats_val) 

do_over: 

dl =ub *a4++ 
dl - dl - 1 
br =[le] too_small 
nop 
nop 

xO ~h *al++ 
dl -h *a0++ 

*(a2 + [xO]) -h dl 

too_small: 

d6 = d6 - 1 

br -[gt] do_over 

nop 

nop 

*(pba + $qcount) =ub al5 
get_out2: 


a4 

=w * 

(sp + 

12) 


d6 

=w * 

(sp + 

8) 


al2 =w 

* (sp 

+ 4) 


d7 

=w * 

(sp + 

0) 


d3 

= *( 

sp + 

16) 


dO 

- d3 




1 1 

d2 = 

*(pba + 

$getaway_address) 

dl 

=ub 

* (pba 

+ $quick_getaway) 

dl 

= dl 

- 1 



dO 

=[eq] d2 



dl 

=sw 

* (xba + 

$STOP) 

dl 

= dl 

- 0 



dO 

= [eq] d3 



br 

- dO 




br 

- M 

sp + 

16) 



*(pba + $sym_index) =uh al5 
dO = &*(sp ++= 20) 
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/******************************************************************* 
* * 

* Function : encode_symbol * 

* Args : none * 

* * 

* Description: * 

* encode_symbol will be called to perform the arithmetic * 

* encoding of a symbol. This routine was lifted from a C * 

* compiled program and included here, for cache coherency. * 


★ ★ 

* Return Values: * 

* None * 

* * 




$encode_symbol: 

xO - dl 

aO = &*(xba + $cum_freq) 
d3 =sw *(xba + $low) 

d2 =sw *(xba + $high) 

dl = xO « 1 

d4 - d2 - d3 

| d2 =g aO 

al - dl + d2 

I dO = &*(sp — 4) 

d4 = d4 + 1 

I *(sp) =w iprs 

dl -uhl d4 

I d5 =sh *(al - 2) 

dl -u d5 * dl 
| d2 -uhl d5 

d2 -u d2 * d4 

dl —u d5 * d4 

| d2 = d2 + dl 

call - I_DIV_JW 

dl = dl + (d2 « 16) 

I xl -sh *a0 

d2 = xl 

dO -uhl d4 

I dl =sh *(aO + [xO]) 

dO =u dO * dl 
I d2 -uhl dl 

d2 —u d4 * d2 

dl -u d4 * dl 

1 d2 = dO + d2 

dl = dl + (d2 « 16) 

d4 = d5 + d3 

II d2 = xl 

call = I_DIV_JW 

d4 = d4 - 1 

*(xba + $high) -w d4 
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d2 = d5 + d3 

dl - d4 - (1 \\ 8) 

br = [ge] L25 

* (xba + $low) ~w d2 ' 

dl -[ge] d2 - (1 \\ 8) 


LI 9: 

call = $bit_plus_follow 

nop 

dl = 0 
br - L23 

dl =sw *(xba + $high) 
dl - dl « 1 


L20: 

dl =sh *(xba + $bits_to_follow) 
d2 = dl + 1 

|| d3 —sw *{xba + $low) 

* (xba + $bits__to_follow) =h d2 

d2 = d3 - (1 \\ 7) 

|| dl =sw *(xba + $high) 

br - L22 

dl = dl - (1 \\ 7) 

| | * {xba + $low) =w d2 

*(xba + $high) =w dl 

L21: 

call = $bit_plus__follow 

nop 

dl - 1 


L22: 
L23: 


dl -svj * (xba + $low) 

dl = dl - (1 \\ 8) 

d2 =sw *(xba + $high) 

dl - d2 - (1 \\ 8) 

* (xba + $low) —w dl 
*(xba + $high) =w dl 

dl = dl « 1 


d4 « dl + 1 

dl —sw *(xba + $low) 
d2 = dl « 1 
dl = d4 - (1 \\ 8) 
br =[It] L19 
*(xba + $high) =w d4 
* (xba + $low) =w d2 


L24: 
L25: 


dl = d2 - (1 \\ 8) 
br -[ge] L21 

nop 

dl -[It] d2 - (1 \\ 7) 
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L30: 


nop 

nop 

nop 

nop 


br 

-[it] 

L30 

dl 

= [ge] 

d4 

br 

= [It] 

L20 

br 

= [ge] 

L30 


br 


* (sp) 


dO = &*(sp ++= 4) 


/★★★★★a************************************************************* 
★ ★ 

* Function : I_DIV_JW * 

* Args : none * 

* * 

* Description: * 

* I_DIV_JW will be called to perform an Integer Divide. * 

* * 


* Return Values: 

* None 

* 


* 

k 

k 


*******************************************************************/ 

;* I_DIV.ASM vl.10 - Integer Divide * 

/* Copyright (c) 1993-1995 Texas Instruments Incorporated * 


+-+ 

I i_div.asm = PP assembly program that is used to return a 32-bit I 

I signed integer quotient from 32-bit signed integer [ 

I division when called by a C program. I 

I I 

+-+ 


.global I_DIV_JW 


; +-'-:-+ 

; | 32-bit Signed Integer Word Divide Subroutine : I 

; ! o Input 32-bit signed integer Operand 1 is in dl (numerator). | 

; | o Input 32-bit signed integer Operand 2 is in d2 (divisor). I 

; | o Output 32-bit signed integer is in d5 (Answer - quotient). I 

; | o Output 32-bit signed remainder is discarded. I 

; | o 0 input divisor produces 0x80000000 output with overflow set. I 

; j o Quotient = 0x80000000 sets overflow. I 

; | o Number of Stack Words used =3. I 

; I o MF register is saved. I 

t I ' 

; | o NOTE: Loop Counter 2 Registers are used but NOT restored ! I 

; +---+ 
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; H j. 

; I 32 bit / 32 bit ===> 32 bit signed quotient I 
; I Signed PP Integer Division I 
; | Numerator / Denominator = Quotient + Remainder (discarded) I 
; I Divide by 0 produces 80000000 and sets sr(V) I 
; | Divide Overflow is not possible if Divisor is non-zero, I 
; I except 80000000/ffffffff = 80000000 will set sr(V). I 


| MF register is preserved. | 

+-+ 


; .ptext / PP assembly code 

argl: .setdl / input argument 1 
arg2: .setd2 ; input argument 2 


Numerator (32 low bits) 
Divisor (32 bits) 


ans: .setd5 ; answer = 32 bit signed quotient 
Div: .setd3 ; Input Divisor 
Num: .setd4 ; Input high Numerator - 0 
Tmp:.setdS ; ALU output for each DIVI 

; .align 8*16; start on a 16-instruction boundary 


I DIV JW: 


; Signed Word Integer Divide: Ans - Opl / Op2 


Div =0-| arg2 I 
I I Msp—[3] ) = Div 
br = [z] Div_By_0 
Num = 0 ; high 

|| *(sp+[l)) = mf 
I I *(sp+[2]) = Num 
mf = | argl | 


; negate I divisor | 

; || push Div 

; Divide By 0 ? 
numerator = 0 
; || push mf 

; | | push Num 

; input lo | numerator | 


lrse2 = 29 ; loop count - 1 

Tmp = divi(Div, Num=Num) ; 1-st divide iterate 

Tmp = divi(Div, Num=Tmp [n] Num ) ; 2-nd divide iterate 

LoopSW: Tmp = divi(Div, Num=Tmp [n] Num ) ; divide iterate 3-32 


ans = mf 

|| Div = *sp++ 
Num = argl A arg2 
|| br — iprs 
ans =[n] -ans 
M mf = *sp++ 
Num = *sp++ 


I ans | = mf 
I | pop Div 
; quotient sign 
; | | return 

; quotient is negative, 
; | | pop mf 

pop Num 


Div_By_0: 
Div Ovfl: 


Divide By 0 \_ Optional Error 

Divide Overflow / Return Code 


br = iprs 

| | Div = *sp++ 
mf = *sp++ 
ans = 0 - 1«31 
|| Num = *sp++ 


; return 

; || pop Div 

; pop mf 

; returns 0x80000000, sets sr(V) 
; || pop Num ...[END] 


.global $update_model 
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y******************************************************************* 
* * 

* Function : update_model * 

* Args : none * 

* * 

* Description: * 

* update_model will be called to update the arithmetic * 

* model’s parameters. This routine was lifted from a C * 

* compiled program and included here, for cache coherency. * 


* ★ 

* Return Values: * 

* None * 

* * 




$update_model: 

aO = &*(xba + $cum_freq) 

nop 


al - dl 

I I dl =sh *a0 

dl = dl - 75 
br =[ne] L6 

nop 

a2 =g [ne.ncvz] al 

d2 =sw *{xba + $No_of_symbols) 
dl « d2 - 0 
br =[It] L6 

nop 

a2 =g [lt.ncvz] al 

dl =g aO 
lei = L5 - 8 

d5 - d2 « 1 

|| d3 - &*{xba + $freq) 

aO = d5 + dl 

d4 - d2 « 1 
I | lrsl = d2 

a8 = d4 + d3 
d2 - 0 

L4 : 

dl =sh *a8 
dl = dl + 1 
d3 = dl »u 31 
dl * dl + d3 
dl = dl »s 1 
dl =shO dl 
*a8 -h dl 
*a0— -h d2 
dl -sh *a8— 
d2 = d2 + dl 
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L5: 





a2 

=g 

al 

L6: 





d3 

= &* 

(xba + $freq) 


dl 

= a2 

« 1 


d5 

= a2 

« 1 


d4 

= d3 

+ dl 


aO 

- d5 

+ d3 


a8 

- d4 

- 2 


nop 




d4 

-sh 

*a8 

1 1 

d3 =sh *a0 


d3 

= d3 

- d4 


br =[ne] L9 

d2 =g [ne.ncvz] a2 

d3 =[ne] d2 - al 

LI: 


dl = 

dl - 

2 


d3 

=sh 

* 

-a8 

a2 = 

a2 - 

1 


d4 

=sh 

* _. 

-aO 

d3 = 

d4 - 

d3 


br - 

[eq] 

LI 



nop 

nop 

L8: 

d2 =q a2 
d3 - d2 - al 

L9: 

br —[ge] Lll 

nop 

nop 

xO = &*(xba + $index_to_char) 

nop 

a8 =ub * (a2 + xO) 

x8 = &*(xba + $char_to_index) 

a9 =ub *(al + xO) 

*(a2 + xO) =b a9 
*(al + xO) =b a8 
* (a8 + x8) =b al 
*(a9 + x8) =b a2 

Lll: 

d3 =sh *a0 
d3 = d3 + 1 

d3 = a2 - 0 
[| *a0 -h d3 

br «[le] L15 
br =[le] LI6 

nop 

le2 - L14 - 8 
d2 = a2 - 1 

d3 = &*(xba + $cum_freq) 
lrs2 - d2 
aO = dl + d3 
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nop 


L13: 

dl =sh *—aO 
dl - dl + 1 
*a0 =h dl 

LI 4 : 

br = LI 6 

nop 

L15: 

nop 

L16: 

br = iprs 

nop 

nop 


.global $bit_plus_follow 


★ * 


* Function : bit_plus_follow * 

* Args : none * 

* * 

* Description: * 

* bit_plus_follow will be called to output several bits that * 

* have been encoded by the arithmetic encoder. * 

* * 


* Return Values: * 

* None * 

* * * 
***★****************************★**********************************/ 


$bit_plus_follow: 

dO = &*(sp —= 8) 

*(sp + 4) =w iprs 
call - $new_output_bit 

nop 

d6 = dl 

| | * (sp + 0) =w d6 

dl =sh *(xba + $bits_to_follow) 
dl = dl - 0 
br «[le] L36 

nop 

dl -[gt] d6 - 0 

d6 - 1 || d6 =[ne] al5 

L34 : 

call = $new_output_bit 

nop 

dl - d6 

dl =sh * (xba + $bits_to_follow) 
dl - dl - 1 
dl =sh0 dl 

dl = dl - 0 

| | *(xba + $bits_to_follow) -h dl 
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br =[gt] L34 

nop 

nop 


L36: 


br 

nop 


*(sp + 4) 


d6 =sw *(sp + 0) 
dO - &*(sp ++= 8) 


.global $bit_index 

/*************************************★***************************** 
* * 

* Function : new_output_bit * 

* Args : dl - the bit to append * 

* * 

* Description: * 

* new_output_bit will be called to append a single bit to * 

* the bitstream array. * 

* ★ 

* Return Values: * 

* None * 


********************************★**********************************/ 


$new__output_bit: 

d2 —sw *(xba + $STOP) 
d2 = d2 - 1 
br =[eq] iprs 

d2 =uh *(xba + $bit_index) 
d4 = d2 »u 3 

|| d3 =uw *(xba + $byte_stream) 

aO - d4 + d3 

dO =uh *(xba + $T_BYTES) 

d4 = (d2&7) 

|| d3 =ub *a0 

d3 = d3 | (dl « d4) 

*a0 =b d3 

M d5 = d2 + 1 

*(xba + $bit_index) =h d5 
d5 = d5 - (dO « 3) 
br =g iprs 

dl = 1 || dl =[lt] al5 /calculate new STOP value 

*(xba + $STOP) —w dl 


end_do_syms2: 
nop 

.global end_do_syms2 
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mpcl -s -g -c -i\pcic80\include mp.c 

mvplnk -x mp.obj num2.obj ppO.out pcic80.cmd -o mp.out -m mp.map 
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erase *.o 
ppcl -s -k main.c 
ppcl -s -k j_enc.c 
ppcl *~s -k modlp.c 
ppcl -k -o2 codefast.c 
ppasm bitebits.s 
ppasm subpass3.s 
ppasm hvtry.s 
ppasm addconv.s 
ppasm mean2.s 
ppasm ztr.s 
ppasm diffs.s 

mvplnk -x main.o hvtry.o mean2.o addconv.o j_enc.o codefast.o ztr.o 
subpass3.o modlp.o diffs.o pcic80a.cmd -t runppO -o ppO.out -m ppO.map 


bitebits.o 
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/*****************************************************★************* 
* * 

** codenew.c (PP Program) 

* * 

** Written by Jim Witham, Code 472300D, 939-3599 
★ * 

** Contains code_dlist2 subroutine. 

* * 

*****************************★************************★************/ 


tinclude <mvp.h> 

#include "arith.h" 

#include "modlp.h” 

/* #define NO_MULTI_BITPLANE */ 

#define XSIZE 512 
#define YSIZE 256 
#define NSCALES 5 
#define AX 16 /* - XSIZE/BS */ 

#define AY 8 /* = YSIZE/BS */ 

#define BS 32 /* - 2 A NSCALES */ 

#define maxsyms 63 

/* REAL-TIME: 2 Zerotrees/partition, last subband truncated */ 

extern unsigned char emubrk; 

extern unsigned int *stomp_flag; 

extern unsigned char *stats_flag; 

extern int STOP; 

extern unsigned short syms_to_do; 

extern unsigned short *save_array; 
extern unsigned char read__more; 
extern unsigned short left_off; 

extern int FIRST_DEC; 

extern unsigned char RB_DEC; 

extern unsigned char PASS_DEC; 

extern unsigned short THRESH; 

extern unsigned short *list; 
extern unsigned short list_index; 

extern unsigned char index_to_char[Max_No_of_symbols+l]; 

extern short *stats_val; 

extern unsigned short symbols_read; 

extern int BITE; 

extern unsigned char PASS_ENC; 
extern unsigned short T MAS EC; 
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extern int SHFT_DN,RB_ENC; 


extern unsigned char sym_array[256]; 
extern unsigned short sym_index; 

extern short *qstats_val; 
extern unsigned short *qindex; 

extern unsigned char qcount; 

extern unsigned char pruned_children [8] ; /* used to keep count of the number of 

pruned */ 


/* children within a zero tree */ 

extern unsigned char pruned[8]; /* used to keep count whether a zero tree is 

pruned */ 


extern unsigned char total_links; /* keeps count of pruned trees */ 
extern unsigned char link_list[6]; /* list with unpruned trees in it */ 


unsigned 

char 

kids[64] 

= (0, 4, 8, 12, 

16, 

18, 

24, 26, 32, 34, 40, 

. 42, 48, 50 

56, 58, 64, 66 

, 68, 









70, 80, 82, 84j 

, 86, 

96, 

98, 100, 102, 112, 

114, 116, 

118, 128, 

130, 

132, 

134, 144, 146, 

148, 

150 

, 160, 162, 164, 166 

, 176, 178, 

180, 182, 

192, 

194, 

196, 198, 208, 

210, 

212 

, 214, 224, 226, 228 

, 230, 240, 


242, 244, 246 }; 


unsigned char max_pruned[4] = { 0, 3, 12, 48}; 
unsigned char childplus[4] = { 0, 2, 4, 8}; 
unsigned char loopind[4] - { 0, 4, 16, 64}; 
unsigned char level; 


★★★a*************************************************************** 

* 

Function : code_dlist2 * 

Args : none * 

★ 

Description: . * 

code_dlist2 scans six zero trees in order for significant * 

coefficients, it also prunes off zero trees that have no * 

significant coefficients below the current level. * 

* 

Return Values: * 

None * 




void code_dlist2{} 

{ 

int sym,i,p,s; 
unsigned short t,tt,j; 
unsigned char temp_links; 

register unsigned int m,c; 

/* Perform Dominant pass */ 

while (STOP — 0) 

{ 
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for (s=0;s<7;s++) pruned[s] = 0; 

for (s=0;s<7;s++) pruned_children[s] = 0; 

for (s=0;s<6;s++) 
comp_dbits(s) ; 


m = 0 ; 

for (s=0;s<6;s++) 

{ 

if (pruned_children[s] < 1) 

{ 

link_list[m++] = s; 
pruned[s] =1; 

} 

} 

total_links = m; 

for (s=0;s<7;s++) pruned_children[s ] = 0; 

level = 1; 
m = 1 ; 

while ((m!=64) && (total__links ! =0) ) 

{ 

for(;m<loopind[level];m++) 

{ 

c = kids[m]; 

for (i=0;i<total_links;i++) 

{ 

s = link_list[i]; 
if ((stats_flag[m + s*256j&2)==2) 

new_dbits_nwo(m, c, childplus[level] , s) / 
else if ((stats_flag[m + s*256]&6)==0) 

new_dbits_wo(m,c,childplus[level], s) / 

I 

} 

temp_links « total_links; 
total_links = 0; 

'for (i=0;i<temp_links;i++) 

{ 

s = link_list[i]; 

if (pruned_children[s] < maxjpruned[level] ) 

{ 

link_list[total_links++] - s; 
pruned_children [ s ] ■» 0; 

} 

} 

for (s-0;s<7;s++) pruned_children[s] = 0; 

level = level + 1; 

} 

if {(STOP==0) && (total_links!=0)) 
for(m=64;m<256;m++) 

1 

for (i=0;i<total_links;i++) 

{ 
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s = link_list[i]; 
if {(stats_flag[m+s*256]&6) ==0) 
{ 

new_dbits2(m,s); 

} 

} 

} 


if (sym_index>0) 
do_syms(); 

sym_index = 0; 


start jnodel(2); 

if (STOP==0) 

PASS^ENC +» BITE; 

t = THRESH»BITE; 

THRESH « THRESH»BITE; 

if (PASS_ENC == BITE) 

{ 

BITE = RB_ENC; 
tifdef NO_MULTI_BITPLANE 
BITE = 1; 

#endif 

TMASK = THRESH; 
for(m=l;m<BITE;m++) 

TMASK = TMASK | (THRESH»m) ; 

} 

else 

{ 

BITE = 1; 

TMASK = THRESH; 

} 


SHFT_DN -= BITE; 

/* Subordinate Pass */ 

if ((STOP == 0) && (list_index > 0)) 
subpass(t); 

sym_index = 0; 

start_model(1<<(BITE+1)); 

/* quick clear all passdown flags */ 

for <i=0;i<384;i++) 

stomp__flag[i] - stomp_flag[i] & Oxfdfdfdfd; 
if ((THRESH & 0x03) != 0) STOP = 1; 

} /* end while */ 
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/******************************************************************* 
★ * 

** diffs.s (PP Program) 

★ * 

** Written by Jim Witham, Code 472300D, 939-3599 
* * 

** File that contains the following assembly language subroutines: 

* * 

** $diff1 
** $diff2 
** $suml 
★ * 

****+**************************************************************/ 

.global $diff1 
.global $diff2 
.global $suml 


* * 


* Function : $diffl * 

* Args : none * 

* Passed in : * 

* * 

* Description: * 

* $diffl takes 16 bit coefficients and subtracts off * 

* 16 bit buffered coefficients, and writes the results * 

* overtop of the old 16 bit coefficients. * 

* * 


The first set of coefficients reside in memory locations * 
$0000-$0BFE, and the buffered coefficients reside in * 
memory locations $0C00-$0FFE and $8000-$87FE. * 


* * 

* Return Values: * 

* none * 

* * 




$diff1: 

aO = &*(dba) ;16 bit difference data - Data Ram 0 

al - &*(dba) + OxcOO /16 bit difference data - Data Ram 1 

sr = Oxad 

le2 “ loopend2 
lrs2 = Oxff 

nop 

nop 

dl - *a0 

d2 = *al++ 

dl =m dl - d2 
loopend2: 

*a0++ = dl 

al * &*(dba) + 0x8000 ;16 bit difference data - Data Ram 2 

le2 = loopend3 
lrs2 = 0x1ff 
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nop 

nop 

dl - *a0 
d2 = *al++ 
dl =m dl - d2 
loopend3: 

*a0++ - dl 

hr - iprs 
sr = 0x36 
nop 


★ * 

* Function : $diff2 - • * 

* Args : none * 

* Passed in : * 

* -k 


* Description: * 

* $diff2 takes 16 bit residual coefficients and does a * 

* ’special' subtraction from the previous buffered * 

* coefficients. This special subtraction is characterized * 

* by this simple pseudo-code: * 

* * 


if (buffer > 0) then result = buffer - residual 
else result = buffer + residual 
if {(buffer - residual) = 0) then result = 0 


The residual resides in memory locations $0000-$0BFE, 
and the buffer resides in memory lcoations $0C00-$0FFE 
and $8000~$87FE. 


* Return Values: * * 

* none * 

* * 

*************************★*****************************************/ 


$diff2: 

aO = &*{dba) /16 bit difference data - Data Ram 0 

'al « &*(dba) + OxcOO ;16 bit difference data - Data Ram 1 

le2 = loopend4a 
lrs2 = 0x1ff 
nop 
nop 


dl =h *a0 
d2 =h *al++ 
d3 = d2 + dl 
d2 = d2 - 0 
d3 =[gt] d2 - dl 
d4 = d2 - dl 
d3 =[eq] al5 
loopend4a: 

*a0++ =h d3 

al = &*(dba) + 0x8000 ;16 bit difference data - Data Ram 2 

le2 = loopendSa 
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lrs2 = 0x3f f 

nop 

nop 

dl =h *a0 
d2 =h *al++ 
d3 = d2 + dl 
d2 = d2 - 0 
d3 = [gt] d2 - dl 
d4 = d2 - dl 
d3 =[eq] al5 
loopendSa: 

*a0++ -h d3 

br = iprs 
sr = 0x36 
nop 


* * 

* Function : $suml * 

* Args : none * 

* Passed in : * 

* * 


* Description: * 

* $suml implements the 'leaky 1 integrator by taking two * 

* blocks of coefficients and multiplying one by a leakage * 

* factor (<1, in our case 7/8) and then summing it with * 

* another block of coefficients. * 

* This is characterized by this simple pseudo-code: * 

* * 

* result = (7/8)* (input + buffer) * 


The residual resides in memory locations $0000-$OBFE, 
and the buffer resides in memory lcoations $0C00-$0FFE 
and $8000-$87FE. 


* Return Values: * 

* none * 

* * 


$suml: 

aO = &*(dba) ;16 bit difference data - Data Ram 0 

al = &*(dba) + OxcOO ;16 bit difference data - Data Ram 1 

le2 = loopend6 
lrs2 = Oxlff 

d3 = 7 
nop 

dl =h *a0 
d2 =h *al++ 
dl - dl + d2 
dl « dl * .d3 
dl - dl »s 3 
loopend6: 

*a0++ =h dl 
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al = &* (dba) + 0x8000 ;16 bit difference data - Data Ram 2 

le2 - loopend7 
lrs2 = 0x3ff 

nop 

no P 

dl =h *a0 
d2 =h *al++ 
dl - dl + d2 
dl - dl * d3 
dl = dl >>s 3 
loopend7: 

*a0++ =h dl 

br = iprs 
sr = 0x36 
nop 
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/***************************************************************************** 
* * 

** hvenc.s (PP Program) 

★ ★ 

** Written by Jim Witham, Code 472300D, 939-3599 
★ ★ 

** File that contains the following assembly language subroutines: 

* * 

** $subdec_vert 

** $subdec_horiz 
★ * 

*****************************************************************************/ 


.global $subdec_vert 
.global $subdec_horiz 


.align 2048 

/******************************************************************★ 
* * 

* Function : $subdec_vert * 

* Args : outerloop,innerloop * 

* Passed in : (dl ,d2) * 

* * 

* Description: * 

* outerloop - the width of the coefficient patch * 

* * 

* innerloop - the height of the coefficient patch * 

* * 

* $subdec_vert will be called to perform the wavelet * 

* decomposition in the vertical direction. * 

* Vr 

* Return Values: * 

* None * 

* * 

**************************************•*****************************/ 


$subdec_vert: 

lctl = 0x0 

IrO = dl - 

lrl = d2 - 

a4 = dl 

d7 - 0 ; k - 0 

; Set up zero-overhead loops 

lei = InnerLoopEnd ; 

lsl = InnerLoop ; 

leO = OuterLoopEnd 

IsO = OuterLoop 

nop 

lctl = 0xa9 /associate leO with IcO and lei with lcl 


d3 

= a4 


d4 

= d3 

+ d3 

xl 

- d4 


d4 

- d4 

+ d3 

x2 

= d4 



/reset looping capability 

1 /outerloop 

3 /innerloop 
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OuterLoop: 

;»» img [index] [ k] = (img [index ] [ k 3 >>1 ) 

( (img[0] [ k] +img [2*index] [k] )»2) ? 

nop 

xO = d7 
nop 

aO =h &*(dba + [xO]) 
nop 

xO = a4 

d2 =sh *(aO + [xl]) 
d3 =sh *(aO) 

d4 =sh *(aO + [xO]) 

II d2 = d2 + d3 

d4 = (d4 >>s 1) 
d4 - d4 - (d2 »s 2) 

* (aO + [xO]) =h d4 

;>>>> img[0][k] += img[index] [k] ; 

d3 = d3 + d4 
*(a0++=[xO]) =h d3 
nop 

InnerLoop: 

;img[2*l + index] [k] = ( img [ 2 * 1 + index ] [k]>>l) 

( (img [2*1] [k]+img[2*l+2*index] [k] )»2) ; 

d3 =sh * (aO + [xO]) 
d4 =sh * (aO + [x2]) 

d2 =sh * (aO + [xl]) 

II d5 = d3 + d4 

d2 = (d2 »s 1) 

I| d4 =sh *(a0++=[xO]) 
d2 = d2 - (d5 »s 2) 

*(a0 + [xO]) «h d2 
|| d5 - d2 + d4 

;img[2*l][k] += ( (img [2*l+index] [k]+img [2*l-index] [ k]+2) »1) ; 

d3 « d3 + (d5 »s 1) 

InnerLoopEnd: 

*(a0++=[xO]) =h d3 

;»» img [YSIZE-index] [k] = ( img [ YSI ZE-index ] [ k ]-img [ YSIZE 

2*index] [ k] ) »1; 

d2 =sh *(aO + [xl]) 
d3 =sh *(aO + [xO]) 
d2 = d2 - d3 
d2 = (d2 »s 1) 

*(a0 + [xl]) «h d2 


B-193 



NAWCWD TP 8442 


;»» img [YSIZE-2*index] [k] + = { (img [ YSIZE-index] [ k] + img [YSIZE 

3*index] [k] + 2) »1) ; 

d4 =sh *(aO) 

; || d2 = d2 + 2 

d5 = d2 + d4 
d3 - d3 + (d5 »s 1) 

*(aO + [xO]) -h d3 

OuterLoopEnd: 

d7 = d7 + 1 


br = iprs 

nop 

nop 


* ★ 


* Function : $subdec_horiz * 

* Args : outerloop,innerloop * 

* Passed in : (dl , d2) * 

* * 


* Description: 

* outerloop - the width of the coefficient patch 

* 


* 

* 

* 


* innerloop - the height of the coefficient patch 

•k 


k 

k 


$subdec_horiz will be called to perform the wavelet 
decomposition in the horizontal direction. 


* * 

* Return Values: * 

* None * 

* * 




•k k j 


$subdec horiz: 


lctl = 0x0 ;reset looping capability 

;set loop reload, counter 
IrO = dl - 1 
lrl = d2 - 3 


a4 = d2 ; innerloop 

d7 = 0 ; k = 0 

Set up loop to iterate {(128 >> scale) - 2) times 
lei = InnerLoopEnd2 ; 

lsl = InnerLoop2 ; 

leO = OuterLoopEnd2 ; 

IsO = OuterLoop2 ; 


nop 

lctl = 0xa9 ;associate leO with IcO and lei with lcl 

nop 

nop 
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OuterLoop2: 

;>>>> img [k] [index] -= ( (img [k] [0] + img [k] [2*index] ) »X) ; 

; aO al a2 

d3 = a 4 
d3 - d3 * d7 
d3 = d3 « 1 
xO = d3 
nop 

aO =h &*(dba + [xO]) 
nop 

d3 =sh *(a0) 
d2 =sh *(aO + [2]) 

d4 =sh *(aO + [1] ) 

|| d2 - d2 + d3 

d4 = d4 - (d2 »s 1) 

*(aO + [1]) =h d4 

;>»> img [k] [0] = (img [k] [0] «1) + img[k][index]; 

d4 - d4 + (d3 « 1) 

*(a0++=[1]) =h d4 

nop 

InnerLoop2: 

;img[k] [2*l+index] -= ((img[k][2*1]+img[k][2*l+2*index])>>1); 

d3 =sh *(aO + [1]) 
d4 =sh *(aO + [3]) 

d2 =sh *(aO + [2]) 

|| d5 = d3 + d4 

d2 = d2 - (d5 »s 1) 

|| d4 =sh *(a0++=[1]) 

* (aO + [1] ) =h d2 
|| d5 = d2 + d4 

/img [k] [2*1] = (img [k] [2*1] «1) + ( (img [k] [2*l+index]+img [k] [2*l-index] ) »1) ; 

d3 ■ d3 « 1 

d3 = d3 + (d5 »s 1) 

InnerLoopEnd2: 

*(a0++=[1]) =h d3 

;>»> img [k] [XSIZE-index] -= img[k] [XSI.ZE-2*index] ; 

d2 =sh *(aO + [2]) 
d3 »sh *(aO + [1]) 
d2 = d2 - d3 

* (aO + [2] ) -h d2 

;»» img [k] [XSIZE-2*index] = (img [ k] [XSIZE-2 * index ] <<1) + 

( (img [k] [XSIZE-index]+img [ k] [XSIZE-3*index] )»1) ; 


B-195 




NAWCWD TP 8442 


d4 =sh * (aO) 

; II d2 = d2 + 2 

d5 = d2 + d4 
d3 « d3 « 1 
d3 « d3 + (d5 >>s 1) 

*(aO + [1]) =h d3 

OuterLoopEnd2: 

d7 = d7 + 1 

br = iprs 

nop 

nop 


branch occurs here 


B-196 




NAWCWD TP 8442 


/★a*************************************************************************** 
★ * 

** j_enc.c (PP Program) 

★ * 

** Written by Jim Witham, Code 472300D, 939-3599 
* 

** PP Program that orchestrates the generation of the bitstream. Processes 
** 6 Zero Trees per partition. 

* * 

***********************★*****************************************************/ 

linclude <mvp.h> 
tinclude "arith.h" 

#include "modlp.h" 

fdefine ASSEM 
idefine ASSEM3 
#define ASSEM2 

/* #define NO_MULTI_BITPLANE */ 

#define XSIZE 512 
#define YSIZE 256 
#define NSCALES 5 
#define AX 16 /* = XSIZE/BS */ 

#define AY 8 /* » YSIZE/BS */ 

#define BS 32 /* = 2 A NSCALES */ 

#define maxsyms 240 

/* REAL-TIME: PROCESSES 2 ZEROTREES PER PARTITION: TRUNCATED */ 

unsigned char emubrk; 

extern short *img; 
extern short *coeff_block; 
extern short *stats_val; 
extern short *ztl; 
extern unsigned char *stats_flag; 
extern unsigned int *stomp_flag; 
extern unsigned int *stomp_ztl; 
extern unsigned char *byte_stream; 
extern unsigned char *tbuf; 

extern unsigned short *list; 
unsigned short list_index; 

extern short *qstats_val; 
extern unsigned short *qindex; 

extern unsigned char qcount; 

extern int No_of_chars; 

extern int EOF_symbol; /* Index of EOF symbol */ 

extern int No_of_symbols; /* total # of symbols */ 

/* TRANSLATION TABLES BETWEEN CHARS AND SYMBOL INDEXES */ 

extern unsigned char char_to_index[Max_No_of_chars] ; 
extern unsigned char index_to_char[Max_No_of_symbols+l]; 
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extern short int curn_freq[Max_No_of_symbols+l] ; 

extern int buffer_index; /* JCW */ 
extern unsigned short bit_index; 

extern unsigned short T_BYTES; 

extern short maxval; 

typedef struct pstr 

{ 

unsigned char d,i; 

} PSTR; 

/* Each element can be read as needed */ 
extern unsigned short *ALLOC; 

extern int FIRST_ENC, FCNT__ENC; /* 1st pass bite size */ 

extern unsigned char *passes_d; 

/* Required on-line storage for high speed */ 

extern int STOP; 
extern int BYTE_CNT; 
extern unsigned short THRESH; 
extern int BITE; 
extern int SIG_COEF; 

/* extern int SIG[15]; */ 

unsigned char PASS_ENC; 
unsigned short TMASK; 

int SHFT_DN,RB_ENC; 
short BYTE_TOTAL; 

extern int whoamiO; 

unsigned char buffer; /* Bits buffered for output */ 

unsigned char bits_to_go; /* # bits free in buffer */ 

unsigned char pruned_children[8]; /* used to keep count of the number of pruned 
*/ 

/* children within a zero tree */ 

unsigned char pruned[8]; /* used to keep count whether a zero tree is pruned */ 

unsigned char total_links; /* keeps count of pruned trees */ 

unsigned char link__list [ 6] ; /* list with unpruned trees in it */ 

void writeout(int sym); 

void start__model (int nchars) ; 
void update_model(int symbol); 

void encode_symbol(int symbol); 
void bit_plus_follow(int bit); 
void output_bit(int bit); 

void new_dbits(int m, int c, int number, int s); 
void new dbits2(int m, int s); 
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void comp_ztr(int s); 

void subpass(int t); 

void diff1(); 
void diff2(); 
void suml(); 
void sum2(); 

/* CURRENT STATE OF ENCODING */ 

int low, high; /* Ends of current code region */ 

short bits_to_follow; /* Number of opposite bits to output 

after the next bits */ 

extern shared int localjxiaxval [4]; 
extern shared int global_maxval; 

/* int time_cdwo, time_cdnwo, time_cd2wo, time__sym; */ 

/* int time_cd2nwo,time_spwo,time_spnwo; */ 

unsigned char *pp_stop_encode = (unsigned char *) Ox010007D4; 

/* START ENCODING A STREAM OF SYMBOLS */ 

void start_encoding() 

{ 

low = 0; 

high - Top_value; 
bits_to_follow « 0; 

} 

void writeout(sym) 
int sym; 

{ 

if (BYTE_CNT++ < T_BYTES) 

{ 

byte_stream[buffer_index++] = sym; /* Output filled buffer JCW*/ 
BYTE_TOTAL++; 

} 

else 

STOP = 1; 


/* INITIALIZE FOR BIT OUTPUT */ 

void start_outputing_bits() 

{ 

buffer - 0; /* Buffer is empty at start */ 

bits_to_go = 8; 

} 


/* Full code range */ 

/* No bits to follow */ 


INPUT_BLOCK 

void input_block(s,p) 
int s,p; 

{ 
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int i,j,k,l; 
i = s * 256; 

for (k=NSCALES-2; k>0; k—) 

{ 

if (k==(NSCALES-2)) 

for(j = 0;j<(l«(NSCALES-1) ) ;j+-(l«k) ) 
for (1=0; 1<U<<NSCALES) ;l+=(2«k) ) 

stats_val[i++] = coeff_block[p* 512 + j*32+l]; 

for (j = 0; j<(l«(NSCALES-l) ) ; j+=(l«k) ) 
for (l=(l«k) ;1<(1«NSCALES) ;l+={2«k) ) 

stats_val[i++] = coeff_block[p*512 + j *32+1]; 

for (j=(l«(k-l)) ;j<(l« (NSCALES-1) ) ; j+- (l«k) ) 
for (1=0;1<(1«NSCALES) ;l+=(2«k) ) 

stats_val[i++J = coeff_block[p*512 + j *32+1]; 

for (j= (1<< (k-1) ) ;j<(l« (NSCALES-1) ) ;j+=(l«k) ) 
for (l=(l«k) ;1<(1«NSCALES) ;l+=(2«k) ) 

stats_val[i++] = coeff_block[p*512 + j *32+1]; 


} 


} 

unsigned char sym_array[25 6] ; 
unsigned short sym_index; 

/ * *******+******************+**********+******+*********************++*+ 

COMP^DBITS 

********************************************************************** + / 
void comp_dbits (s) 
int s; 

{ 

int sym,tflg,apx,t,mO; 
int i,m; 

m = s * 256; 

if ((stats_flag[s * 256]&6)==0) 

{ 

tflg = ((TMASK&ztl[s * 64]) == 0); 
t = (TMASK&abs(stats_val[m]))>>SHFT_DN; 

sym = t + ( ( (t !=0)&&(stats_val [m] >0) ) ? ( (1«BITE)-1) : 0) + ((~tflg)&l); 

if ((t>0)&&(list_index<254) ) 
list[list_index++] = m; 

sym_array[sym_index++] = sym; 

if (t>0) 

{ 

qstats_val[qcountj = abs(stats_val[m])- (((t*THRESH)>>(BITE- 
1) ) + (THRESH»BITE) ) ; 

qindex[qcount++] = m; 

} 

if (tflg!=0) 

{ 

pruned_children[s] = 1; 

stats_flag[m + 1] = stats_flag[m + 1] | (tflg«l); 
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stats_flag[m + 2] = stats_flag[m + 2] | (tflg«l) ; 

stats_flag[m + 3] - stats_flag[m + 3] | (tflg«l); 

} 

stats_flag[m] = (t!=0)«2; 

} 

else if ((stats_flag[m]&2) !=0) 

{ 

stats_flag[m] = stats_flag[m] & 252; 
stats__flag [m + 1] = stats__flag [m + 1] I 2; 

stats_flag[m + 2] = stats_flag[m + 2] | 2; 

stats_flag[m + 3] = stats_flag[m + 3] I 2; 

pruned_children[s] = 1; 


} 


/ * ************************************************************************* * j 


void comp_dbits2 (m) 
int m; 

{ 

int sym,t,tflg; 

t - (TMASK&abs (stats_val [m] ) ) »SHFT_DN; 

sym = t + (((t!=0)&& (stats_val [m] >0) ) ? ( (1«BITE)-1) : 0) + (t!=0); 

/* writeout(sym); */ 

sym_array[sym_index++] = sym; 

if (sym_index > maxsyms) 

{ 

do_syms(); 
sym_index = 0; 

} 


/* stats_val[m] = (t>0) ? (abs(stats_val[m])-((t*THRESH)/(1<<(BITE 

1) )+THRESH/(2« (BITE-1) )) ) : stats_val [m] ; */ 

stats_val [m] = (t>0) ? (abs(stats_val[m])-(((t*THRESH)»(BITE- 
1) ) + (THRESH»BITE) ) ) : stats_val [m] ; 
stats_flag[m] = (t!=0)<<2; 


} 
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/* 


CODE_DLIST 

********************************************************************** 


*/ 


void code_dlist() 

{ 

int sym,i,p,s; 
unsigned short t,tt,j; 
short saveme; 
unsigned char savemec; 

register unsigned int m,c; 

/* Perform Dominant pass */ 

for (s=0;s<6;s++) 

comp_dbits(0,1,2,3, s); 


if (STOP==0) 

for(m=l;m<4;m++) 

{ 

c = 4*m; 

for (s-0;s<6;s++) 

{ 

#ifdef ASSEM 

if ( (stats_flag[m + s*256]&2)==2) 
new_dbits_nwo(m,c, 2,s); 
else if ((stats_flag[m + s*256j&6)”0) 
new_dbits_ wo (m, c, 2, s); 

#else 

if ((stats_flag[m + s*256]&6) !=4) 
comp__dbits (m, c, c+1, c+2, c+3, s) ; 

#endif 

} 

} 


if (STOP==0) 

for(m-4;m<16;m++) 

{ 

c = 8*(m/2) + 2*(m%2); 
for (s=0;s<6;s++) 

{ 

#ifdef ASSEM 

if ( (stats_f lag [m + s*256] &2) ~2) 
new__dbits_nwo (m, c, 4, s) ; 
else if ((stats_flag[m + s*256]&6)==0) 
new_dbits_wo(m, c, 4, s) ; 

#else 

if ((stats_flag[m + s*256]&6) !—4) 
comp_dbits(m, c, c+1, c+4, c+5,s) ; 

#endif 

} 

} 


if (STOP==0) 

for(m=16;m<64;m++) 

{ 

c - 16*(m/4) + 2*(m%4) ; 
for (s-0;s<6;s++) 

{ 
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#ifdef ASSEM 

if ((stats_flag[m + s*256]&2)==2) 
new_dbits_nwo(m, c, 8, s); 

else if {(stats_flag[m + s*256]&6)==0) 
new_dbits_wo(m, c, 8, s); 

#else 

if ((stats_flag[m + s*256]&6) !=4) 
comp_dbits(m, c,c+1,c+8,c+9,s); 

#endif 

} 

} 


/* Eliminate if NSCALES = 3 */ 

if (STOP==0) 

for(m=64;m<256;m++) 

{ 

for (s=0;s<6;s++) 

{ 

if ((stats_flag[m+s*256]&2) ==2) 

{ 

stats_flag[m+s*256] = stats_flag[m+s*256] & 252; 

} 

else if ((stats_flag[m+s*2563&6)==0) 

{ 

#ifdef ASSEM3 
new_dbits2(m,s); 

#else 

comp_dbits2(m+s*256); 

#endif 

} 

} 

} 


if (sym_index>0) 
do_syms{); 

sym_index = 0; 


start_model(2); 

if (STOP==0) 

PASS_ENC += BITE; 

t = THRESH»BITE; 

THRESH = THRESH»BITE; 

if (PASS_ENC “ BITE) 

{ 

BITE = RB_ENC; 

#ifdef NO_MULTI_BITPLANE 
BITE - 1; 

#endif 

TMASK = THRESH; 
for(m=l;m<BITE;m++) 

TMASK = TMASK | (THRESH»m) ; 

} 

else 
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{ 

BITE = 1; 

TMASK = THRESH; 

} 

SHFT_DN -= BITE; 

/* Subordinate Pass */ 

subpass(t); 

if (sym_index>0) 
do_syms() ; 

sym_index = 0; 

start_model (1« (BITE+1)) ; 

} 


/* TIT********************************************************************* 

P_C0DE 

**+*+**+++****************+******+*+*****+*************+*********+**** * / 

void p_code() 

{ 

int k,l,i,j,mask,bite,pmin; /* maxval */ 

int aa,ab; 

/* DEFINE THIS */ 

BYTE_TOTAL = 12; 

sym_index = 0; 

aa = Oxffffffff; ab = 0x00; 

/* Compute mask */ 

FIRST_ENC = tbuf[0] ; 

if (FIRST_ENC == 6) 

{ 

bite = 3; 

RB_ENC = 3; 

} 

else if (FIRST_ENC — 5) 

{ 

bite = 3; 

RB_ENC = 2; 

} 

else if (FIRST_ENC==4) 

{ 

bite =2; 

RB_ENC = 2; 

} 

else 

{ 
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bite - FIRST_ENC; 

RB_ENC =1; 

} 

#ifdef NO_MULTI_BITPLANE 
BITE = 1; 

#endif 

pmin = 100; 

maxval = 0; 

for (j=0;j<10;j++) 

{ 

for (i=0;i<6;i=i+2) 

{ 

/* wait for new coefficients */ 

while ( (INTFLG& (1«20) )==0) ; 

INTFLG = 1«20; 

input_block(i, 0) ; 
input_block(i + 1,1); 

/* tell MP done inputting this block of coefficients */ 
asm(" x2 = 0x00002100"); 

asm(" cmnd = x2 M ); 

} 

/* wait for buffer to be written in */ 
while ( {INTFLG& (1«20) ) ==0) ; 

INTFLG = 1«20; 

/* memory - coefficients - buffer */ 
difflO ; 

/* determine maxval */ 

for (k=0;k<1536;k++) 

{ 

if (abs(stats_val[k]) > maxval) 
maxval = abs(stats_val[k]); 

} 

local_jnaxval[whoami()] = maxval; 

/* tell MP done with differencing and maxval calculation */ 

asm(" x2 = 0x00002100”); 

asm( M cmnd = x2”) ; 

} /* end for j */ 

/* wait for global maxval computation to complete */ 

while { {INTFLG& (1«20) )—0) ; 

INTFLG = 1«20; 

maxval = l<<global_maxval; 

mask = maxval; 
for(k=l;k<bite;k++) 

mask = mask | {maxval>>k); 
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/* Main Loop: Continue until stop condition is reached */ 

while (*pp_stop_encode — 0) 

{ 

/* emubrk =1/ */ 

BITE = bite; 

#ifdef NO_MULTI__B IT PLANE 
BITE - 1; 

#endif 

PASS_ENC = 0; 

STOP = 0; 

BYTE_CNT - 0; 

THRESH = maxval; 

SHFT_DN = global_maxval - BITE + 1; 

TMASK = mask; 
buffer_index = 0; 

list_index = 0; 

start_model (1« (BITE+1) ) ; 
start_outputing_bits (); 
start_encoding(); 

/* wait for new coefficients */ 
while ( (INTFLG& (1«20 ) )==0) ; 

INTFLG = 1«20; 

if (*pp_stop_encode == 0) 

{ 

for (i=0;i<6;i++) 

{ 

comp_ztr(i); 

} 

T_BYTES = ALLOC[0]; 

for (bit_index=0;hit_index<T_BYTES;bit_index++) byte_stream[bit_index] 

0 ; 

bit_index = 0; 

for(j=0;j<384;j++) 
stomp_flag[j] = 0; 

emubrk = 0x21; 

code_dlist2(); 

emubrk = 0x22; 

/* emubrk = 3; */ 

if (ALLOC[0] <3) 
passes__d[0] - 0; 
else if (ALLOC[0]==3) 

passes_d[0] = PASS_ENC-4; 
else if (ALLOC[0]<7) 

passes_d[0] - PASS_ENC-2; 
else 
{ 
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passes_d[0] = PASSJBNC-1; 
if (PASS_ENC-1 < pmin) 
pmin = PASS_ENC-1; 

} 

/* tell MP done with this block of pixels */ 

asm(" x2 - 0x00002100"); 

asm(" cmnd = x2"); 

/* wait for coefficients to be written in */ 
while { (INTFLG& (1«20) ) ==0) ; 

INTFLG = 1«20; 

/* special differencing */ 

/* memory = coefficients before quant. - leftover coeff after quant */ 
/* 0x0000 = 0x0c00->0x0ffe - 0x0000->0xbfe */ 

diff2(); 

/* tell MP done with the differencing */ 

asm{" x2 « 0x00002100"); 

asm(" cmnd = x2"); 

/* wait for coefficients to be written in */ 
while ( (INTFLG& (1«20) )==0) ; 

INTFLG = 1«20; 

suml () ; 

/* tell MP done with the summing */ 

asm(" x2 = 0x00002100") ; 

asm{" cmnd = x2"); 

} /*end if */ 

} /* end while */ 

/* Transmit next BITE size *1 


if (pmin-=0) 

FIRST^ENC -= 1; 

else if ({pmin > FIRST_ENC)&&(FCNT_ENC > 2)&&(pmin<7)) 

{ 

FCNT_ENC = 0; 

FIRST_ENC = pmin-1; 

} 

else if (pmin > FIRST_ENC) 

FCNT_ENC++; 

else if (pmin<=FIRST_ENC) 

{ 

FIRST_ENC = pmin; 

FCNT^ENC - 0; 

} 

tbuf[1] = FIRST ENC; 
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/***********************************************************************★***** 
* * 

** main.c (PP Program) 

★ ★ 

** Written by Jim Witham, Code 472300D, 939-3599 
* * 

** Main PP Program that calls all the other routines to perform the wavelet 
** decomposition and bitstream generation. 

* * 

*****★*****★**★**★★+★******★★**************★**★**★*****★★*★**********★★******/ 


#include "modlp.h" 


#define XCOLS 8 /* Max Columns that can be held in internal memory */ 

#define YROWS 4 /* Max Columns that can be held in internal memory */ 


#define XSIZE 512 /* Max Image Size */ 

tdefine YSIZE 240 /* Max Image Height */ 

#define ML 50 /* Max order of filter allowed */ 

#define NSCALES 4 
#define AX 16 /* = XSIZE/BS */ 

#define AY 8 /* = YSIZE/BS */ 

#define BS 32 /* = 2 A NSCALES */ 


#define ENCODE 
#define ENCODE_STREAM 

#define DISPLAY 


unsigned short T_BYTES; 
unsigned char *pic; 
short *img; 


short *stats_val; 
short *stats_apx; 
unsigned char *stats__flag; 
unsigned int *stomp_flag; 
unsigned int *stomp_ztl; 
unsigned char *byte_stream; 
short *coeff_block; 
unsigned short *ztl; 
unsigned char *tbuf; 
unsigned short *ALLOC; 

unsigned short *list; 

short *qstats_val; 
unsigned short *qindex; 

unsigned char qcount = 0; 


int No of chars; 


int EOF___symbol; I* Index of EOF symbol */ 
int No_of_symbols; /* total # of symbols */ 

/* TRANSLATION TABLES BETWEEN CHARS AND SYMBOL INDEXES */ 


unsigned char char_to_index[Max_No_of_chars]; /* JCW old version was int */ 
unsigned char index_to_char[Max_No_of_symbols+l]; /* JCW old version was int */ 
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short int cum_freq[Max_No_of_symbols+l]; /* JCW old version was int */ 
int buffer_index; /* JCW */ 

extern void subdec_vert(int outerloop,int innerloop); 
extern void subdec_horiz{int outerloop,int innerloop); 

extern int calc_n_sub_mean(int oldmean); 
extern void add_n_conv8(int oldmean); 

/* extern void subsyn_vert(int outerloop,int innerloop); 
extern void subsyn_horiz(int outerloop,int innerloop); */ 

extern int whoami(); 

/* ************************* MAIN program ************************** */ 

cregister extern volatile unsigned int INTFLG; 

short column_outerloop[5] = { 8, 16, 32, 16, 8 }; 

short row_outerloop[5] = { 4, 8, 16, 8, 4 }; 

short column_innerloop[5] * { 120, 60, 30, 15, 8 }; 

/* short column_innerloop[5] = { 59, 29, 14, 6, 8 }; */ 

short row_innerloop[5] = { 256, 128, 64, 32, 16 }; 

unsigned char vert_loop_index[6] = { 16, 4, 1, 1, 1 }; 
unsigned char horiz_loop_index[6] = { 15, 5, 1, 1, 1 }; 

short maxval; 

extern shared int mean_pp[4]; 
extern shared int global_mean; 

unsigned char *passes_d; 

/* extern int RB_ENC,RB_DEC; */ 

extern unsigned char PASS_ENC, PASS__DEC; 

int STOP; 
int BYTE_CNT; 
unsigned short THRESH; 
int BITE; 
int SIG_COEF; 

/* int SIG[15]; */ 

int FIRST_ENC,FIRST_DEC; 
int FCNT_ENC, FCNT__DEC; 

unsigned int getaway_address; 
unsigned char quick_getaway = 0; 

main () 

{ 


int k,l; 
int mean; 
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/* initialize pic[], img[], stats_val[], stats_flag[], byte_stream[j, 
coeffjolock[] pointers */ 

asm(" dl = &*(dba)"); 

asm(" *(xba+$pic) = dl"); 

asm(" *(xba+$img) = dl"); 

asm(" *(xba+$stats_val) = dl"); 

asm(" * (xba+$stats_apx) = dl"); 

asm(" dl = &*(dba + 0x8000)"); 
asm(" * (xba+$stats_flag) = dl"); 

asm(" * (xba+$stomp_flag) = dl") ; 

asm(" * (xba+$coeff_block) ~ dl"); 

asm(" xO = 0x8602"); 

asm(" nop"); 

asm(" dl = &*(dba + xO)"); 
asm(" * (xba+$list) - dl"); 

asm(" dl = &*(dba + OxcOO)"); 
asm(" * (xba+$ztl) = dl"); 

asm(" * (xba+$stomp_ztl) * dl"); 

asm(" dl = &*(pba + 0x630)"); 
asm(" *(xba+$byte_stream) = dl"); 

asm(" dl = &*(pba + 0x620)"); 

asm(" * (xba+$passes_d) = dl") ; 

asm(" dl - &* (pba + 0x5fc)"); 

asm(" *(xba+$tbuf) = dl"); 

asm(" dl = &* (pba + 0x600)"); 

asm(" *(xba+$ALLOC) = dl"); 

Asm(" dl = &*(pba + 0x100)"); 

asm(" *(xba+$qstats_val) = dl"); 

asm(" dl = &*(pba + 0x180)"); 

asm(" * (xba+$qindex) = dl"); 

FIRST_ENC = 4; 

FIRST_DEC = 4; 

tbuf[0] = FIRST_DEC; 

FCNT_ENC = 0; 

FCNTJDEC = 0; 

/* Clear the message interrupt flag that comes from the MP, just in case */ 

INTFLG = 1«20; 
quick_getaway - 0; 

while (1) 

{ 

#ifdef ENCODE 
mean = 0; 

/* Decompose image */ 

for(k=0;k<(NSCALES+1);k++) 

{ 

for (1^0; l<horiz_loop__index [k] ;1++) 

{ 

/* wait for new pixels */ 
while ( (INTFLG & (1«20) ) ==0) ; 


and 
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INTFLG = 1«20; 
if (k==0) 

mean = mean + calc_n_sub_mean(global_mean); 

subdec_horiz (row__outerloop [k] , row_innerloop [ k] ) ; 

/* tell MP done with this block of pixels */ 

• asm(" d7 - 0x00002100")/ 

asm(" cmnd = d7") ; 

} 

/* Put out local mean for mp to do global mean calculation */ 
mean_pp[whoami{)] = mean; 
if (k!=NSCALES) 

for (1=0;l<vert_loop_index[k];1++) 

{ 

/* wait for new pixels */ 
while ( (INTFLG& (1«20) )-«0) / 

INTFLG = 1«20; 

subdec_vert(column_outerloop[k],column_innerloop[k]); 

/* tell MP done with this block of pixels */ 

asm(" d7 = 0x00002100"); 

asm(" cmnd = d7"); 

} 

} 

#endif 

/* Fake it for now */ 

/* maxval = 0x800<<3; */ 

/* Form data stream */ 

#ifdef ENCODE_STREAM 
p_code(); 
tendif 

} 

} 
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/******************★************************★**************************** 
★ ★ 

** mean2.s (PP Program) 

★ 'k 

** Written by Jim Witham, Code 472300D, 939-3599 

* * 

** File that contains the following assembly language subroutines: 

* * 

** $calc_n_sub_mean 
** $whoami 
★ ★ 

******************************************+******+***+*+***+*+**++**+**** 


.global $calc_n_sub_mean 
.global $whoami 


/***********************************★*****_************************** 
* * 

* Function : $calc_n_sub_mean * 

* Args : oldmean * 

* Passed in : dl * 

* * 

* Description: * 

* oldmean - the mean on the previous image * 

* * 


$calc_n_sub_mean will be called to both calculate the sum * 
of the pixels in this patch, and to subtract off the * 

previous images mean (performing an 8 bit minus and 16 * 

bit subtraction, yielding a 16 bit result, which is then * 
further processed by shifting it to the left 2 places. * 


* Return Values: * 

* d5 - returns the sum of this patch of pixels. * 

* * 
★★★★★★a********************************************************-****/ 


$calc_n_sub_mean: 

d5 = 0 

a8 - &*(dba) + 4094 
aO = &*(dba) + 2047 

dl - 0 - (dl « 2) ; make mean so it can be subtracted 

le2 = loopend 
lrs2 = 510 

d2 =ub *a0— 
d5 = d5 + d2 

; loop starts 

xl — dl + (d2 « 2) 


d2 

=ub * 

aO— 

*a8 

— =h 

xl 

d5 

- d5 + 

d2 

xl 

- dl + 

(d2 

d2 

=ub * 

aO — 

*a8 

— =h 

xl 

d5 

- d5 + 

d2 
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xl = dl + (d2 « 2) 
i | d2 =ub *aO~ 

*a8-- =h xl 
II d5 = d5 + d2 

xl = dl + (d2 « 2) 
|| d2 =ub *aO— 
loopend: 

*a8— =h xl 
II d5 = d5 + d2 

xl — dl + (d2 « 2) 

I | d2 =ub *a0~ 

*a8— =h xl 

II d5 = d5 + d2 

xl — dl + (d2 « 2) 

I| d2 =ub *a0— 

*a8— =h xl 
|| d5 = d5 + d2 

xl = dl + (d2 « 2) 

|| d2 «ub *a0-- 
*a8— =h xl 
II d5 = d5 + d2 

xl = dl + (d2 « 2) 
*a8— =h xl 

br = iprs 

nop 

nop 


* * 

* Function : $whoami * 

* Args : none * 

* Passed in : * 

* * 


* Description: * 

* $whoami will be called to determine which PP this is. * 

* * 

* Return Values: * 

* d5 - returns the PP number assigned to this PP. * 

* * 




$whoami: 


d5 ■ comm & 0x03 
br « iprs 
nop 
nop 
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/******************************************************************* 

-k 

** modlp.c (PP Program) 

* * 

** Written by Jim Witham, Code 472300D, 939-3599 
★ * 

** Contains start_model subroutine. This subroutine initializes 
** the translation tables and the frequency counters for the 
** arithmetic coder. 

* * 

*********★*********************************************************/ 

/* ADAPTIVE SOURCE MODEL */ 

tinclude "modlp.h" 

#include "arith.h" 

short int freq[Max_No_of_symbols+l]; /* Symbol frequencies */ 

extern int No_of_chars; 

extern int EOF_symbol; /* Index of EOF symbol */ 

extern int No_of_symbols; /* total # of symbols */ 

/* TRANSLATION TABLES BETWEEN CHARS AND SYMBOL INDEXES */ 

extern unsigned char char__to_index[Max_No_of_chars]; 

extern unsigned char index_to__char[Max_No_of_symbols+l] ; 

extern short int cum_freq[Max_No_of_symbols+l]; 

extern int low,high; 

extern short bits__to_follow; 

extern unsigned char bits__to_go; 

extern unsigned char buffer; 

extern int BYTE_CNT; 

extern unsigned short T_BYTES; 

extern unsigned char *byte_stream; 

extern int buffer_index; 

extern short BYTEJTOTAL; 

extern int STOP; 

/* OUTPUT A BIT */ 

void output_bit(bit) 
int bit; 

{ 

buffer »= 1; /* Put bit in top of buffer */ 

if (bit) 

buffer I- 0x80; 
bits_to_go -= 1; 
if (bits__to_go == 0) 

{ 

if (BYTE_CNT++ < T_BYTES) 

{ 

byte_stream[buffer_index++] = buffer; /* Output filled buffer JCW*/ 
BYTE_TOTAL++; 

} 

else 

STOP = 1; 
bits_to_go = 8; 
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} 

} 

unsigned short bit_index; 

/* INITIALIZE THE MODEL */ 

void start_model(nchars) 
int nchars; 

{ 

int i; 

/* Initialize number of chars */ 

No_of_chars = nchars; 
No_of_symbols = nchars + 1; 

/* Setup translation tables */ 

for(i=0;i<No_of_chars;i++) 

{ 

char__to__index [i] = i + 1; 
index_to_char[i+1] = i; 

} 

/* Initialize frequency counts */ 

for(i=0;i<=No_of_symbols;i++) 

{ 

freqfi] = 1; 

cum_freq[i] = No_of_symbols-i; 

} 

freq[0] - 0; 

} 
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/* INTERFACE TO THE MODEL */ 

/* Set of symbols that may be encoded */ 

#define Max_No_of_chars 16 
#define Max_No_of_symbols 17 

/* Cumulative Frequency Table */ 

#define Max_frequency 75 /* 16383 Maximum allowed frequency cnt 2 A 14-1 */ 
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Z****************************************************************************** 
* * 

★ * 

** mp.c (MP Program) 

* * 

** Written by Jim Witham, Code 472330D, 927-1440 
★ * 

** MP Program that orchestrates data movement and kicks off PP's to implement 

** the balanced wavelet algorithm written by Chuck Creusere. 

* * 

** Interframe Encode version 
*/ 


#include <stdlib.h> 
#include <stdio.h> 
tinclude <mvp.h> 


#include <mvp_hw.h> 
#include <mp_ptreq.h> 


tinclude 

#include 

#include 

#include 

#include 

#include 


"icl.h" 

”vil24.h" 

"vol.h" 

"bgl.h" 

"pcic80.h" 

"cil.h" 


/* define compression ratio. Ratio is actually #:1 compression */ 
#define HOST 


/* idefine variable_compression */ 
#define compression 40 
#define CAMERA 

/* #define INTERLEAVE_CAMERA */ 
#define CHANNEL_LIMIT 

#define ENCODE 

#define ENCODE_STREAM 

#define DISPLAY 
#define SHOWDISPLAY 

/* #define INTER_DISPLAY */ 

/* #define OLONDISPLAY */ 

#define headersize 0 

/* tdefine DEBUG */ 

#define XSIZE 512 
#define YSIZE 240 

#define AX 16 
#define AY 15 
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/* 

** Define a bit mask for host request bit 
*/ 

#define SigBit(x) ( ( (UINT32)1)«(x)) 

#define HostRequestBitMask SigBit(8) 
tdefine FrameDoneBit 8 

/********★*****************************************************************/ 
extern int *number jpixels; 

#pragma DATA_SECTION(Semaphore,"mpjvars") 
long Semaphore; 

#pragma DATA_SECTION(encode_time, "mpjvars") 

#pragma DATA_SECTION(decode_time,"mp_vars") 
int encode_time, decode__time, proc_time; 

#pragma DATA_SECTION (start_time, "mp_vars") 
unsigned int start_time; 

#pragma DATA_SECTION(mean_pp,"sh^vars") 

#pragma DATA_SECTION(globaljnean, M sh_vars") 

shared int mean_pp[4]; 
shared int global_mean; 

#pragma DATA_SECTION(local jnaxval, n sh_vars M ) 

#pragma DATA_SECTION(global_maxval,"sh_vars") 

shared int localjmaxval[4 ] ; 
shared int global_maxval; 

typedef struct pstr 

{ 

unsigned char d,i; 

} PSTR; 


#pragma DATA_SECTION(local_alloc, "mp_vars") 
unsigned short local_alloc[AX*AY/6]; 

#pragma DATA_SECTION(comp_table,"mp_vars") 
unsigned char comp_table[5] = {10, 20, 40, 80, 100}; 

#pragma DATA_SECTION (last_comp, f ’mp_vars") 
unsigned char last_comp = 2; 

#pragma DATA_SECTION(compressionjratio,"mp_vars") 

/* UINT32 *compression_ratio = (UINT32 *) 0x010107D0; */ 
UINT32 *compressionjratio; 

static void SignalHandler(UINT32 Signals); 
static void init_alloc_table(UINT32 index); 
static void init alloc table2(UINT32 index); 


unsigned char ppnum[128]; 
unsigned char pptask[128]; 
int pptime[128]; 
int ppstarttime - 0; 
unsigned char ppindex = 0; 
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/**********************************************************************+***/ 

void task (void *arg) 

{ 

unsigned char times=0; 
unsigned char flip = 0/ 
unsigned int stream_addr[2]; 

PCIC80STAT ReturnVal; 


PVIL24 pVil24; 

ICL_IMG *vim,*f0,*fl; 
unsigned int dx,dy; 
int i,j; 

int Ip; 

unsigned char vert_loop_index[6],horiz_loop_index[6]; 
int jump_col[6]; 
long j ump_row[6]; 

long *unused_pixels; 

long *ptr; /* temp pointer */ 

PTREQ *p[10]; /* temp pointer to packet transfer structure */ 

int tempjmaxval; 

/* JCWtest */ 

char stringl[32]; 
int temp_str; 
float temp__time; 

int junki; 

unsigned short junkfill = 0; 

unsigned short * fillme = (unsigned short *) 0x90320000; 


int k,l; 
long tempi; 
unsigned char tbuf; 

int t_alloc; 
int p_alloc; 
int r_alloc; 

unsigned char * tbuf_pp0 
unsigned char * tbuf_ppl 
unsigned char * tbuf_pp2 
unsigned char * tbuf_pp3 

unsigned char min^first; 


(unsigned char *) 0x010005fc; 
(unsigned char *) OxOlOOISfc; 
(unsigned char *) 0x010025fc; 
(unsigned char *) 0x010035fc; 


unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 


char ppnexttask[4]; 

short ppalloc[4]; 

char current_block; 

int table_pointer = 0x80380000; 

int table_address [4] ; 

short table_size[4]; 

char pprequesting; 

char ppinfo[4]; 

char ppdone; 
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unsigned char *pp_stop_encode = (unsigned char *) 0x010007D4; 

unsigned char D0NE1; 

float channel_bandwidth; 
float bits_per_frame; 
float fps; 

float secs_between_frames; 
float ticks_between_frames ; 
unsigned int time_wait; 

#ifdef INTERLEAVE_CAMERA 
unsigned int started__acq; 
unsigned int done_acq; 
unsigned int line_acq; 
unsigned int last_status; 
unsigned int this_status; 

. #endif 

UINT32 temp_ulong; 

UINT32 savej?7_src; 

Semaphore = 0; 
comp_table[0] =10; 
computable[1] = 20; 
comp_table[2] = 40; 
comp_table[3] = 80; 
computable[4] = 100; 
last_comp = 2; 

compression_ratio = (UINT32 *) Ox010107DO; 

stream_addr[0] = 0x90023000; 
stream_addr[1] = 0x90028000; 

NOCACHE_INT(number_pixels[0]) = 0x80; 

NOCACHE_INT(global_mean) = 0x5e; 

*pp__stop_encode = 0; 

/* initialize FIRST variable on all PPs - used for first pass # of bitplanes to 

process */ 

tbuf_pp0[0] = 4; 

tbuf_ppl[0] = 4; 

tbuf_pp2[0] = 4; 

tbufj?p3[0] = 4; 

#ifdef HOST 

ReturnVal = CilSigHandler(SignalHandler); 
if (ReturnVal != CIL_OK) 

{ 

/*** Cannot register signal handler ***/ 
while(1) ; 
return; 

} 

tendif 

dx = 512; /* size of ROI which will be initialised*/ 

dy = 480; /* if too big, may not run at frame rate*/ 


pVil24 = Vil240pen(); 


/* Open VIL module 


*/ 
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if ( pVil24 == NULL ) exit(O); /* Failed to open VIL module */ 

Vil241nitialize(pVil24,VIL24_EIA_DEFAULT); /* set up for CCIR camera */ 

Vil24SetVcrMode(pVil24,1); /* ensure lock to poor sources */ 

Vil24SetROI(pVil24, 64,0, dx,dy); /* set ROI in the center */ 

vim = IclCreateHdr(1,1,ICL_IMG_CUSTOM); /* create image header structure ... 

*/ 

Vil24InitImgHdr(pVil24,vim); /* ... and initialise to describe 

VIL */ 

VolSetDisplay(VOL_VGA8_1024); /* setup colour display for VIM-8 */ 

VolSetGreyLUTO ; 

p[0] = (PTREQ *) (MP_PARM_RAM + Ox2cO); 
p[l] = P[0] + 1; 

p[2] = p[0] + 2; 

P [3] = p [0] + 3; 

P[4] = p[0] + 4; 

p[5] = p[0] + 5; 

P [ 6] = p [0] + 6; 

p[7] = p[0] + 7; 

p[8] = p[0] + 8; 

p[9] - p(0] + 9; 

/* Set MP list pointer to point to first PT */ 
ptr = (long *) (MP_PTREQ_PTR) ; 

*ptr = (long) p[0]; 

#ifdef INTERLEAVE_CAMERA 

/* Fifo Bank 0 -> DRAM (8 bit data)*/ 


p[0]->link = p[0] ; 

/* 

point 

to next PT 

V 

p[0]->word[0] 

= 

0x80000002; 

/* 

Weird 

Fifo to linear 

*/ 

p[0]->word[1] 

= 

OxaOOOOOOl; 

/* 

Src 

address 

is VIM pixel 

fifo */ 

p[0]->word[2] 

= 

0x80300000; 

/* 

Dst 

address 

is DRAM 

*/ 

p[0]->word[3] 


OxOlffOOOl; 

/* 

Src 

B 

count 

Src A count 

*/ 

p[0]->word[4] 

= 

0x00000200; 

/* 

Dst 

B 

count 

Dst A count 

*/ 

p[0]->word[5] 

= 

0x00; 

/* 

Src 

C 

count 


*/ 

p[0]->word[6] 

= 

0; 

/* 

Dst 

c 

count 


*/ 

p[0]->word[7] 


0x08; 

/* 

Src 

B 

pitch 


*/ 

p[0]->word[8] 

= 

0x000; 

/* 

Dst 

B 

pitch 


*/ 

p[0]->word[9] 

= 

0x400; 

/* 

Src 

C 

pitch 


*/ 

p[0]->word[10] 

= 

0x200; 

/* 

Dst 

C 

pitch 


*/ 

p[0]->word[11] 


0; 

/* 

Src 

transparency upper 

*/ 

p[0]->word[12] 

- 

0; 

/* 

Src 

transparency lower 

*/ 

p[0]->word[13] 

= 

0; 

/* 

Reserved 


*/ 

p[0]->word[14] 

= 

0; 

/* 

Reserved 


*/ 


#else 

/* Fifo Bank 0 -> DRAM (8 bit data)*/ 


p [0]->link = 

p [ 0 ]; 

/* 

point to next PT 

*/ 

p[0]->word[0] 

= 

0x80000000; 

/* 

Weird Fifo to linear 

*/ 

p[0]->word[1] 


OxaOOOOOOl; 

/* 

Src address is VIM pixel 

fifo */ 

p[0]->word[2] 

= 

0x80300000; 

/* 

Dst address is DRAM 

*/ 

p[0]->word[3] 

= 

OxOlffOOOl; 

/* 

Src B count Src A count 

*/ 

p[0]->word[4] 

= 

OxOOef0200; 

/* 

Dst B count Dst A count 

*/ 

p[0]->word[5] 

= 

Oxef; 

/* 

Src C count 

*/ 

p[0]->word[6] 

= 

0; 

/* 

Dst C count 

*/ 

p[0]->word[7] 

= 

0x08; 

/* 

Src B pitch 

>/ 
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p[0]->word[8] 

= 0x200; 

/* 

Dst B pitch 


*/ 

p[0]->word[9] 

= 0x400; 

/* 

Src C pitch 


*/ 

p[0]->word[10] 

- 0; 

/* 

Dst C pitch 


*/ 

p[0]->word[ll] 

= 0; 

/* 

Src transparency 

upper 

V 

p[0]->word[12] 

= 0; 

/* 

Src transparency 

lower 

*/ 

p[0]->word[13] 

= 0; 

/* 

Reserved 


*/ 

p[0]->word[14] 

- 0; 

/* 

Reserved 


*/ 


#endif 

/* DRAM -> internal coefficient blocks (16 bit data) (32x16 words)*/ 


p [ 1 ] ->link = p[1]; 

/* 

point to next PT 

*/ 

p[1]->word[0] = 

0x80000000; 

/* 

Contig. Mem to int no update 

*/ 

p [1]->word[1] = 

0x80320000; 

/* 

Src address is DRAM 

*/ 

p [ 1]->word[2] = 

0x00008000; 

/* 

Dst address is internal 

*/ 

p[1]->word[3] = 

0x000f0040; 

/* 

Src B count Src A count 

*/ 

p [1]->word[4] = 

0x00000400; 

/* 

Dst B count Dst A count 

*/ 

p[ 1] ->word[5] - 

0x00; 

/* 

Src C count 

*/ 

p [ 1] ->word [6] = 

0; 

/* 

Dst C count 

*/ 

p [1] ~>word[7] = 

0x0400; 

/* 

Src B pitch 

*/ 

p [ 1]->word[8] = 

0x000; 

/* 

Dst B pitch 

*/ 

p[1]->word[9] = 

0x40; 

/* 

Src C pitch 

*/ 

p [ 1 ] ->word [ 10 ] 

= 0x1000; 

/* 

Dst C pitch 

*/ 

p [ 1]->word[11] 

= 0; 

/* 

Src transparency upper 

*/ 

p [1]->word[12] 

= 0; 

/* 

Src transparency lower 

*/ 

p [ 1 ] ->word[13] 

= 0; 

/* 

Reserved 

*/ 

p[1]->word[14] 

= 0; 

/* 

Reserved 

*/ 

DRAM (8 bit data) 

“> VRAM (raw image) 

*/ 



P [2]->link = p[2] ; 

/* 

point to next PT 

*/ 

p [2]->word[0] - 

0x80000000; 

/* 

linear to VRAM 

*/ 

p [2]->word[l] - 

0x80300000; 

/* 

Src address is DRAM 

*/ 

p[2]->word[2] - 

0xb4000000 ; 

/* 

Dst address is VRAM 

*/ 

p[2]->word[3j = 

0x00038000; 

/* 

Src B count Src A count 

*/ 

p [2 ] ->word [4] = 

OxOOff0200; 

/* 

Dst B count Dst A count 

*/ 

p [2]->word[5] = 

0x00; 

/* 

Src C count 

*/ 

p [2]->word [6] - 

0; 

/* 

Dst C count 

*/ 

p[2]->word[7] = 

0x8000; 

/* 

Src B pitch 

*/ 

p[2]->word[8] = 

0x800; 

/* 

Dst B pitch 

*/ 

p[2]->word [9] = 

0x00; 

/* 

Src C pitch 

*/ 

p[2]->word[10] = 

0x0000; 

/* 

Dst C pitch 

*/ 

p[2]->word[11] = 

0; 

/* 

Src transparency upper 

*/ 

p[2]->word[12] = 

0; 

/* 

Src transparency lower 

*/ 

p[2]->word[13] - 

0; 

/* 

Reserved 

V 

p[2]->word[14] - 

0; 

/* 

Reserved 

*/ 

DRAM (rows and 8 bit data) ~> internal 

RAM */ 


Can be used 4 times then change dst <- 

this can be done 8 times */ 


p[3]->link - p[3]; 

/* 

point to next PT 

*/ 

p [3]->word[0] = 

0x80000202; 

/* 

Contig. Mem to int w/s update 

*/ 

p[3]->word[1] = 

0x80300000; 

/* 

Src address is DRAM 

*/ 

p [3]->word[2] = 

0x00000000; 

/* 

Dst address is internal 

*/ 

p[3]->word[3] = 

0x00000800; 

/* 

Src B count Src A count 

*/ 

p [3]->word[4] = 

0x00000800; 

/* 

Dst B count Dst A count 

*/ 

p[3]->word[5] = 

0x00; 

/* 

Src C count 

*/ 

p[3]->word[ 6] = 

0; 

/* 

Dst C count 

V 

p [3] ->word[7] = 

0x0000; 

/* 

Src B pitch 

V 

p[3]->word[8] = 

0x000; 

/* 

Dst B pitch 

*/ 

p[3]->word[9] = 

0x0800; 

/* 

Src C pitch 

*/ 
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p[3]->word[10] 

= 0x1000; 

/* 

Dst C pitch 

*/ 

p[3)->word[11] 

- 0; 

/* 

Src transparency upper 

*/ 

p [3]->word[12] 

= 0; 

/* 

Src transparency lower 

*/ 

p[3]->word[13] 

= 0; 

/* 

Reserved 

*/ 

p[3]->word[14] 

= 0; 

/* 

Reserved 

*/ 


/* DRAM (columns and 16 bit data) -> internal RAM */ 

/* Can be used 4 times then change dst <- this can be done 8 times */ 


p[4]->link = p[4]; 


/* point to next PT 


*/ 

• p[4]->word[0] = 0x80000002; 


/* Contig. Mem to int w/dst 

updt 

*/ 

p [4]->word[1] = 0x80320000; 


/* Src address is DRAM 


*/ 

p[4]“>word[2] = 0x00000000; 


/* Dst address is internal 


*/ 

p[4]->word[3] = 0x00 ff0010; 


/* Src B count Src A count 


*/ 

p[4]->word[4] - 0x00001000; 


/* Dst B count Dst A count 


*/ 

p[4]->word[5] = 0x00; 


/* Src C count 


*/ 

p[4]->word[6] = 0; 


/* Dst C count 


*/ 

p [4]->word[7] - 0x0400; 


/* Src B pitch 


V 

p [4]->word[8] = 0x000; 


/* Dst B pitch 


*/ 

p [4]->word[9] = 0x10; 


/* Src C pitch 


*/ 

p[4]“>word[10] = 0x1000; 


/* Dst C pitch 


*/ 

p [4]->word[11] = 0; 


/* Src transparency upper 


*/ 

p[4]~>word[12] = 0; 


/* Src transparency lower 


*/ 

p [4]->word[13] = 0; 


/* Reserved 


*/ 

p [4]->word[14] = 0; 


/* Reserved 


*/ 

internal RAM -> DRAM (columns 

and 

16 bit data) */ 



Can be used 4 times then change 

src 

then this can be repeated 8 times */ 


p [5]->link = p [5] ; 


/* point to next PT 


*/ 

p [5] ->word[0] = 0x80000200; 


/* int to Contig. Mem w/src 

updt 

*/ 

p [ 5j->word[1] = 0x00000000; 


/* Src address is internal 


*/ 

p[5]->word[2] - 0x80320000; 


/* Dst address is DRAM 


*/ 

p[5]->word[3] = 0x00001000; 


/* Src B count Src A count 


*/ 

p[5]“>word [ 4 ] = OxOOffOOlO; 


/* Dst B count Dst A count 


*/ 

p [5]->word[5] - 0x00; 


/* Src C count 


*/ 

p[5]->word[6] = 0; 


/* Dst C count 


*/ 

p [ 5 ] ->word[7 ] = 0x0000; 


/* Src B pitch 


*/ 

p[5]->word[8] = 0x400; 


/* Dst B pitch 


*/ 

p [5]->word[9] = 0x1000; 


/* Src C pitch 


*/ 

p[5]->word[10] = 0x0010; 


/* Dst C pitch 


*/ 

p [ 5]->word [ 11 ] = 0; 


/* Src transparency upper 


*/ 

p[5]->word[12] = 0; 


/* Src transparency lower 


*/ 

p[5]->word[13] = 0; 


/* Reserved 


*/ 

p[5]->word[14] = 0; 


/* Reserved 


*/ 

DRAM (rows and 16 bit data) -> 

internal RAM */ 



Can be used 4 times then change 

dst 

then this can be repeated 8 times */ 


p [ 6]->link = p [ 6]; 


/* point to next PT 


*/ 

p[6]->word[0] - 0x80000002; 


/* Contig. Mem to int w/dst 

updt 

*/ 

p[6]->word[1] = 0x80320000; 


/* Src address is DRAM 


*/ 

p[6]->word[2] = 0x00000000; 


/* Dst address is internal 


*/ 

p [ 6]“>word[3] = 0x00001000; 


/* Src B count Src A count 


*/ 

p[6]->word[4] = 0x00001000; 


/* Dst B count Dst A count 


*/ 

p [ 6]->word[5] = 0x00; 


/* Src C count 


*/ 

p[6]->word[6] = 0; 


/* Dst C count 


*/ 

p [ 6]->word[7] = 0x0000; 


/* Src B pitch 


*/ 

p [ 6]~>word[8] = 0x000; 


/* Dst B pitch 


*/ 

p [ 6]->word[9] = 0x0000; 


/* Src C pitch 


*/ 

p [ 6]->word[10] - 0x1000; 


/* Dst C pitch 


*/ 

p[6]->word[11] = 0; 


/* Src transparency upper 


*/ 
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p[6]->word[12] = 0; 
p[6]->word[13] = 0; 
p [ 6]->word[14] « 0; 


/* Src transparency lower 
/* Reserved 
/* Reserved 


p[7] 
p[7] 
p[7] 
P [7] 
P [7] 
P [7] 
P [7] 
P [7] 
p [7] 
P [7] 
P [7] 
P [7] 
P [7] 

P [7] 

P [7] 
p [7] 


->link = p[ 
->word[0] = 
->word[l] - 
->word[2] = 
->word[3] = 
->word[4] = 
->word[5] = 
->word[6] = 
->word[7] = 
->word[8] = 
->word[9] = 
->word[10] 
->word[11] 
->word[12] 
->word[13] 
->word[14] 


/* internal RAM -> DRAM (rows and 16 bit data) */ 

/* Can be used 4 times then change src then this can be repeated 8 times */ 


3; 

/* 

point to next PT 

*/ 

0x80000200; 

/* 

int 

to DRAM 

w/src updt 

*/ 

0x00000000; 

/* 

Src 

address 

is internal 

*/ 

0x80320000; 

/* 

Dst 

address 

is DRAM 

*/ 

0x00001000; 

/* 

Src 

B count 

Src A count 

*/ 

0x00001000; 

/* 

Dst 

B count 

Dst A count 

*/ 

0x00; 

/* 

Src 

C count 


*/ 

a; 

/* 

Dst 

C count 


*/ 

0x0000; 

/* 

Src 

B pitch 


*/ 

0x000; 

. /* 

Dst 

B pitch 


*/ 

0x1000; 

/* 

Src 

C pitch 


*/ 

‘ 0x0000; 

/* 

Dst 

C pitch 


*/ 

0; 

/* 

Src 

transparency upper 

*/ 

0; 

/* 

Src 

transparency lower 

*/ 

0; 

/* 

Reserved 


*/ 

0; 

/* 

Reserved 


*/ 


/* internal RAM -> VRAM (rows and 8 bit data) */ 

/* Can be used 4 times then change src then this can be repeated 8 times */ 


p[8]->link - p[8] ; 

/* 

point to next PT 

• V 

p[8]->word[0] = 0x80000202; 

/* 

int 

to VRAM 

w/dst updt 

*/ 

p[8]->word[1] = 0x00000000; 

/* 

Src 

address 

is internal 

*/ 

p[8]->word[2] - 0xb4000200; 

/* 

Dst 

address 

is VRAM 

*/ 

p[8]->word[3] = 0x00000800; 

/* 

Src 

B count 

Src A count 

*/ 

p[8]->word[4] = 0x00030200; 

/* 

Dst 

B count 

Dst A count 

*/ 

p[8]->word[5] = 0x00; 

/* 

Src 

C count 


*/ 

p[8]->word[6] = 0; 

/* 

Dst 

C count 

- 

*/ 

p[8]->word[7] = 0x0000; 

/* 

Src 

B pitch 


*/ 

p[8]->word[8] - 0x400; 

/* 

Dst 

B pitch 


*/ 

p[8]->word[9] = 0x1000; 

/* 

Src 

C pitch 


*/ 

p[8]->word[10] = 0x1000; 

f* 

Dst 

C pitch 


*/ 

p[8]->word[11] = 0; 

/* 

Src 

transparency upper 

*/ 

p[8]->word[12] = 0; 

/* 

Src 

transparency lower 

*/ 

p[8]->word[13] « 0; 

/* 

Reserved 


*/ 

* p[8]->word[14] = 0; 

/* 

Reserved 


*/ 

internal RAM ~> DRAM (byte stream 

8 bit data) * 

V 



p[9]->link = p[9]; 

/* point to next PT 

* / 

p [9]->word[0] = 0x80000202; 

/* 

int 

to VRAM 

w/dst updt 

*/ 

p[9]->word[l] = 0x01000630; 

/* 

Src 

address 

is internal 

*/ 

p[9]->word[2] - 0x80380000; 

/* 

Dst 

address 

is VRAM JCWW 


p [ 9]->word[3] = 0; 

/* 

Src 

B count 

Src A count 

*/ 

p[9]->word[4] = 0; 

/* 

Dst 

B count 

Dst A count 

*/ 

p[9]->word[5] = 0x00; 

/* 

Src 

C count 


*/ 

p[9]->word[6] - 0; 

/* 

Dst 

C count 


*/ 

p[9]->word[7] - 0x0000; 

/* 

Src 

B pitch 


*/ 

p[9]->word[8] = 0x000; 

/* 

Dst 

B pitch 


*/ 

p[9]->word[9] = 0x1000; 

/* 

Src 

C pitch 


*/ 

p[9]->word[10] = 0; 

/* 

Dst 

C pitch 


*/ 

p[9]->word[ll] = 0; 

/* 

Src 

transparency upper * 

*/ 

p[9]->word[12] = 0; 

/* 

Src 

transparency lower 

*/ 

p[9]->word[13] = 0; 

/* 

Reserved 


*/ 
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p[9]->word[14] = 0; /* Reserved */ 

Vil24SetSequenceMode(pVil24,1) ; /* set VIL to sequence mode 

*/ 

Vil24StartCapture (pVil24) ; /* start capturing images 

*/ 

/* IE = disable () & OxfbfOfffe; */ 

IE = 0x01/ 

command(0x2000000f)/ /* unhalt PPO */ 

vert_loop_index[0] = 16/ 
vert_loop_index[1] = 4; 
vert_loop_index[2] = 1; 
vert_loop_index[3] = 1; 
vert_loop__index [ 4 ] =1; 

horiz_loop_index[0] = 15; 
horiz_loop_index[1] =5; 
horiz_loop_index[2] = 1; 
horiz_loop_index[3] = 1; 
horiz_loop_index[4] = 1; 

jump_col[0] = 16; 
jump__col [ 1 ] = 64; 
jump_col[2] = 25 6; 
jump_col[3] - 256; 
jump_col[4] = 256; 

jump_row[0] = 0x1000; /* not used */ 

jump_row[l] = 0x3000; 
jump_row[2] = OxFOOO; 
jump_row[3] - 0x10000; 
jump_row[4] = 0x10000; 

/* Zero out unused lines in the input (lines 240-255) */ 

/* unused_pixels = (long *) 0x8031e000; 

for (i=0;i<2048;i++) 

NOCACHE^INT(unused^pixels[i]) = 0; */ 

/* setup for fill with value PT */ 


p[9]->link = p[9]; 

/* 

point 

to next PT 

*/ 

p[9]->word[0} = 

0x80001000; 

/* 

int 

to VRAM 

w/dst updt 

*/ 

p[9]->word[1] = 

0x00000000; 

/* 

Src 

address 

is internal 

*/ 

p[9]->word[2] = 

0x8031e000; 

/* 

Dst 

address 

is VRAM JCWW 


p[9]->word[3] = 

0x00000000; 

/* 

Src 

B 

count 

Src A count 

*/ 

p[9]->word[4] « 

0x00002000; 

/* 

Dst 

B 

count 

Dst A count 

*/ 

p[9]->word[5] ~ 

0x00; 

/* 

Src 

C 

count 


*/ 

p[9]->word[6] = 

0; 

/* 

Dst 

C 

count 


*/ 

p[9]->word[7] = 

0x0000; 

/* 

Src 

B 

pitch 


*/ 

p[9]->word[8] = 

0x000; 

/* 

Dst 

B 

pitch 


*/ 

p[9]->word[9] = 

0x0000; 

/* 

Src 

C 

pitch 


*/ 

p[9]->word[10] = 

0; 

/* 

Dst 

C 

pitch 


*/ 

p[9]“>word[11] = 

0; 

/* 

Src 

transparency upper 

*/ 

p[9]->word[12] = 

0; 

/* 

Src 

transparency lower 

*/ 

p[9]->word[13] = 

0; 

/* 

Reserved 


*/ 
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p[9]->word[14] = 0; /* Reserved */ 

/* zero out unused pixels using a fill with value packet transfer */ 

*ptr = (long) p[9]; 

/* kick off TC to transfer pixel info to DRAM */ 

PKTREQ |« MP_PKTREQ_P_BIT; i - 0; while(PKTREQ & 0x02); 

p[9]->word[2] = 0x80420000; /* Dst address is VRAM JCWW 

*/ 

■ p[9]->word[4] = Ox003B1000; /* Dst B count Dst A count */ 

p[9]->word[8] = 0x1000; /* Dst B pitch */ 

/* zero out buffer */ 

*ptr « (long) p[9]; 

/* kick off TC to transfer pixel info to DRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i - 0; while(PKTREQ & 0x02); 

I* re-initialize p[9] */ 


p[9]->link = p[9]; 

/* point to next 

; PT 

*/ 

p[9]->word[0] 


0x80000202; 

/* int 

to VRAM 

w/dst updt 

*/ 

p[9]~>word[1] 

= 

0x01000630; 

/* Src 

address 

is internal 

*/ 

p[9]->word[2] 

= 

0x80380000; 

/* Dst 

address 

is VRAM JCWW 


p[9]->word[3] 

= 

0; 

/* Src 

B count 

Src A count 

*/ 

p[9]“>word[4] 

= 

0; 

/* Dst 

B count 

Dst A count 

*/ 

p[9]->word[5] 

= 

0x00; 

/* Src 

C count 


*/ 

p[9]->word[6] 

= 

0; 

/* Dst 

C count 


*/ 

p[9]->word[7] 

= 

0x0000; 

/* Src 

B pitch 


*/ 

p[9]->word[8] 

= 

0x000; 

/* Dst 

B pitch 


*/ 

p [9]->word[9] 

= 

0x1000; 

/* Src 

C pitch 


*/ 

p[9]->word[10] 

= 

0; 

/* Dst 

C pitch 


*/ 

p[9]->word[11] 


0; 

/* Src 

transparency upper 

*/ 

p[9]->word[12] 

= 

0; 

/* Src 

transparency lower 

*/ 

p[9]->word[13] 

= 

0; 

/* Reserved 


*/ 

p[9]~>word[14] 

= 

0; 

/* Reserved 


*/ 
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/* Initialize the adaptive bit allocation tables on the PPs */ 
init_alloc_table(2)/ 

Vil24SetReadBank(pVil24,0); /* prepare to read 1st field 

*/ 

*/ 


Vil24WaitForData(pVil24); /* wait for image data */ 

TCOUNT - Oxffffffff; 

channel_bandwidth = 48000.0; 

bits_per_frame = (YSIZE*XSIZE/comp_table[last_comp] - headersize) * 8.0; 
fps = channel_bandwidth / bits_per_frame; 
secs_between_frames « 1.0/fps; 

ticks_between_frames = secs_between_frames/0.000000025; 
time_wait = ticks_between__f rames; 
time_wait = Oxffffffff - time_wait; 

while (1) 

{ 

#ifdef CHANNEL_LIMIT 

while (TCOUNT > time_wait); 

#endif 

#ifdef CAMERA 

Vil24WaitForData(pVil24); 

V 

Vil24SetReadBank(pVil24,0); 

*/ 

iendif 

encode_time = Oxffffffff - TCOUNT; 

TCOUNT = Oxffffffff; 

#ifdef CAMERA 

*ptr = (long) p[0]; 

/* kick off TC to transfer pixel info to DRAM */ 

PKTREQ |= MP_PKTREQ_PJBIT; i = 0; while(PKTREQ & 0x02); 

#endif 

/* clear the interrupt flag */ 

INTPEN = OxfOOOO; 

#ifdef HOST 

tifndef variable_compression 
/* Read Compression Ratio */ 

CilReadMailbox (1, (PUINT32)compression__ratio) ; 

channel_bandwidth = ((*compression_ratio) » 8) * 1.0; 

if (channel_bandwidth < 10) channel_bandwidth = 56000.0; 

if (channeljoandwidth > 100000001) channel_bandwidth = 56000.0; 

*compression_ratio = *compression_ratio & Oxff; 


/* wait for image data 
/* prepare to read 1st field 
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if (*compression_ratio > 4) *compression_ratio = 2; 
if {*compression_ratio < 0) *compression_ratio = 2; 

if (*compression_ratio ! = last_comp) 

{ 

last_comp = *compression_ratio; 
init_alloc_table(last__comp); 

} 

bits_per_frame - (YSIZE*XSIZE/comp_table[last_comp] - headersize) * 8.0; 
fps = channel_bandwidth / bits_per_frame; 
secs_between_frames = 1.0/fps; 

ticks_between__f rames = secs_between_frames/0.000000025; 
time_wait = ticks_between_frames; 
time_wait = Oxffffffff - time_wait; 

#endif 

tifdef variable_compression 
/* Read Compression Ratio */ 

CilReadMailbox(1, (PUINT32)compression_ratio); 

if (*compression_ratio > 16640) *compression__ratio = 16640; 
if {*compression_ratio < 480) *compression_ratio = 480; 

if (*compression_ratio != last_comp) 

{ 

last_comp = *compression_ratio; 
init_alloc_table2(last_comp); 

} 

tendif 

#endif 

/* forget second field */ 

/* Display the incoming raw data */ 
p[2]->word[2] = 0xb4 000000; 

*ptr = (long) p[2]; 

/* kick off TC to transfer pixel from DRAM to VRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = 0; while(PKTREQ & 0x02); 

p[2]->word[2] = 0xb4 000400; 

/* kick off TC to transfer pixel from DRAM to VRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = 0; while(PKTREQ & 0x02); 

/* Encode coefficients of the raw image */ 

tifdef ENCODE 

/* TCOUNT = Oxffffffff; */ 

p[3]->word[l] = 0x80300000; /* Src address is DRAM */ 

for (lp=0; lp<5; lp++) 
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/* Do 32 rows */ 

for (i=0; i<horiz_loop_index[lp]; i++) 

{ 


if (i==0) 

{ 

p[6]->word[1] = 0x80320000; /* Src address is SDRAM */ 
p[7]->word[2] = 0x80320000; /* Dst address is SDRAM */ 
} 

p[3]->word[2] - 0x00000000; /* Dst address is internal */ 
p[6]->word[2] = 0x00000000; /* Dst address is internal */ 
p[7]->word [1] = 0x00000000; /* Src address is internal */ 


if (lpss-O) 

*ptr = (long) p[3]; 
else 

*ptr = (long) p[6]; 

if (i==0) 

{ 

switch (lp) { 
case 0: 

{ 

/* p[6] never used here to move coefficients in the first time, p [ 3 ] is used 
instead */ 

p[6] ->word[3] - 0x00001000; 

p[6]->word[4] = 0x00001000; 

p[6]->word[5] = 0x00000000; 

p[6]->word[7] = 0x00000000; 

p[6]->word[9] = 0x00000000; 

p [7] *->word [3] = 0x00001000; 

p[7]“>word[4] - 0x00001000; 

p[7]->word[6] = 0x00000000; 

p[7]->word[8] = 0x00000000; 

p[7]->word[10] = 0x00000000; 
break; 

} 

case 1: 

{ 

p[6]->word[3] = 0x00ff0002; 

p[6] ->word[4] ~ OxOOOOOcOO; 

p[6]«>word[5] = 0x00000005; 

p[6]->word[7] = 0x00000004; 

p[6]->word[9] = 0x00000800; 

p[7]->word[3) = OxOOOOOcOO; 

p[7]->word[4] - 0x00ff0002; 

p[7]->word[6] = 0x00000005; 

p[7]->word[8] = 0x00000004; 

p[7]->word[10] = 0x00000800; 
break; 

} 

case 2: 

{ 

p[6]->word[3] = 0x007f0002; 

p[6]->word[4] = OxOOOOOfOO; 

p[6]->word[5] = OxOOOOOOOe; 

p[6] ->word[7] - 0x00000008; 
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p[6]->word[9] = 0x00001000; 


p[7]->word[3] 
p [7]->word[4] 
p [7]->word[6] 
p [7]->word[8] 
p [7]->word[10] 
break; 

} 

case 3: 

{ 

p[6]->word[3] 
p[6]->word[4] 
p[6]->word[5] 
p[6]->word[7] 
p [6}->word[9] 

p[7]->word[3] 
p[7]->word[4] 
p[7]->word[6] 
p[7]“>word[8] 
p[7]->word[10] 
break; 

} 

case 4: 


= OxOOOOOf00; 
= 0x007f0002; 
= OxOOOOOOOe; 
- 0x00000008; 
= 0x00001000; 


= 0x003f0002; 
= 0x00000400; 
= 0x00000007; 
= 0x00000010; 
= 0x00002000; 

= 0x00000400; 
= 0x003f0002; 
- 0x00000007; 
= 0x00000010; 
= 0x00002000; 


p[6]->word[3] = OxOOlf0002; 
p[6]->word[4] = 0x00000100; 
p[6]->word[5] - 0x00000003; 
p[6]->word[7] = 0x00000020; 
p[6]->word[9] - 0x00004000; 


p[7]->word[3] 
p[7]->word[4] 
p[7]->word[6] 
p[7]->word[8] 
p[7]->word[10] 
break; 

} 

default; 

break; 

} 

} 


0x00000100; 

0x001f0002; 

0x00000003; 

0x00000020; 

0x00004000; 


PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002001) ; /* send msg interrupt to PP0 */ 

#endif 

p[6]->word[1] = p[6]->word[1] + jump_row[lp]; 

PKTREQ |- MP_PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal */ 

i-i + 1; 
i = i - 1; 
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while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002002); /* send msg interrupt to PP1 */ 

#endif 

p [ 6)->word [ 1 ] = p [ 6]->word [ 1 ] + jump__row [lp]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal *1 

i * i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002004); /* send msg interrupt to PP2 */ 

#endif 

p[6]->word[l] = p[6]->word[1] + jump_row[lp] ; 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002008); /* send msg interrupt to PP3 */ 

#endif 

p [ 6]~>word[1] = p[6]->word[1] + jump_row[lp]; 

#ifndef DEBUG 

while((INTPEN & 0x10000)==0x00); /* poll PP0 */ 

INTPEN = 0x10000; /* clear the interrupt flag */ 

#endif 

*ptr = (long) p[7]; 

PKTREQ |= MP__PKTREQ_P_BIT; /* start xfer from internal to DRAM */ 

i-i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

p [7]->word[2] = p[7]->word[2] + jump_row[lp]; 

#ifndef DEBUG 

while((INTPEN & 0x20000)==0x00); /* poll PP1 */ 

INTPEN = 0x20000; /* clear the interrupt flag */ 

#endif 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to DRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 
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p[7]->word[2] = p[7]->word[2] + jump_row[lp]; 

#ifndef DEBUG 

while((INTPEN & 0x40000)==0x00); /* poll PP2 */ 

INTPEN = 0x40000; /* clear the interrupt flag */ 

#endif 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to DRAM */ 

i = i + 1; 
i - i - 1; 

while(PKTREQ & 0x02); 

p[7]->word[2] = p[7]->word[2] + jump_row[lp]; 

#ifndef DEBUG 

while((INTPEN & 0x80000)==0x00); /* poll PP3 */ 

INTPEN = 0x80000; /* clear the interrupt flag */ 

#endif 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to DRAM */ 

i = i + l; 
i = i - 1; 

while(PKTREQ & 0x02); 

p[7]->word [2] = p[7]->word[2] + jump_row[lp]; 

} /* end for i */ 


NOCACHE_INT (global_mean) = (NOCACHE_INT (mean_pp [0] ) + NOCACHE_INT (mean_pp [ 1 ] ) + 

NOCACHE_INT(mean_pp[2] ) + NOCACHE_INT (mean__pp [3] ) ) / (512*240); 


/* Do the appropriate number of columns depending on the scale */ 
if (lpi=4) 

for (i=0; i<vert_loop_index[lp]; i++) 

{ 

if (i==0) 

{ 

p[4]->word[1] = 0x80320000; /* Src address is DRAM */ 

p[5]->word[2] = 0x80320000; /* Dst address is DRAM */ 

} 

p[4]->word[2] = 0x00000000; /* Dst address is internal */ 

*ptr = (long) p[4]; 
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#ifdef DEBUG 
/* JCWtest */ 
if (i~l) 

{ 

for (junki=0;junki<256*256;junki^junki+1) 

I 

fillme[junki] = junkfill; 
junkfill « junkfill + 1; 


} 

#endif 
if (i—0) 


switch (lp) { 
case 0: 

{ 

p[4]->word[3] 
p[4]->word[4] 
p[4]->word[5] 
p[4]->word[7] 
p[4]->word[9] 


OxOOef0010; 
OxOOOOOfOO; 
0x00000000; 
0x00000400; 
0x00000000; 


p[5]->word[3] 
p[5]->word[4] 
p[5]->word[6] 
p[5]->word[8] 
p[5]->word[10] 
break; 

} 

case 1: 

{ 

p[4]->word[3] 
p[4]->word[4] 
p[4]->word[5] 
p[4]->word[7] 
p[4]->word[9] 

p[5]->word[3] 
p[5]->word[4] 
p[5]->word[6] 
p[5]->word[8] 
p[5]->word[10] 
break; 

} 

case 2: 

{ 

p[4 ] ->word[3] 
p[4]->word[4] 
p[4]“>word[5] 
p[4]->word[7] 
p[4]->word[9] 


= OxOOOOOfOO; 
= OxOOef0010; 
- 0x00000000; 
= 0x00000400; 
= 0x00000000; 


- OxOOOf0002; 
= OxOOOOOfOO; 
= 0x00000077; 
= 0x00000004; 
= 0x00000800; 

- OxOOOOOfOO; 
= OxOOOf0002; 
= 0x00000077; 
= 0x00000004; 
= 0x00000800; 


= OxOOlf0002; 
= OxOOOOOfOO; 
= 0x0000003b; 
= 0x00000008; 
= 0x00001000; 


p[5]->word[3] 
p[5]->word[4] 
p[5]->word[6] 
p[5]->word[8] 
p[5]->word[10] 
break; 

} 


= OxOOOOOfOO; 
= OxOOlf0002; 
= 0x0000003b; 
= 0x00000008; 
= 0x00001000; 
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case 3: 

{ 

p[4]->word[3] 
p[4]->word[4] 
p[4]->word[5] 
p[4]->word[7] 
p[4]->word[9] 


= OxOOOf0002; 
= 0x000003c0; 
= OxOOOOOOld; 
= 0x00000010; 
- 0x00002000; 


p[5]~>word[3] 
p[5]->word[4] 
p[5]->word[6] 
p[5]->word[8] 
p[5] ->word[10] 
break; 

} 

default: 
break; 

} 

} 


0x000003c0; 

0x000f0002; 

OxOOOOOOld; 

0x00000010; 

0x00002000; 


/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

#ifndef DEBUG 

command{0x00002001); /* send msg interrupt to PP0 */ 

#endif 

p[4]->word[1] = p[4]->word[1] + jump_col[lp] ; /* Src address is DRAM 

*/ 

/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002002); /* send msg interrupt to PP1 */ 

#endif 

p[4]->word[1] = p[4]->word[1] + jump_col[lp]; /* Src address is DRAM */ 

/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; 

i - i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002004); /* send msg interrupt to PP2 */ 

#endif 

p[4]->word[l] = p[4]->word[1] + jump_col[lp]; /* Src address is DRAM */ 
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/* kick off TC to transfer columns of pixels from DRAM to internal */ 

PKTREQ j = MP_PKTREQ_P_BIT; 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

#ifndef DEBUG 

command(0x00002008); /* send msg interrupt to PP3 */ 

#endif 

p[4]->word[1] - p[4]->word[1] + jump_col[lp]/ /* Src address is DRAM */ 

p[5]“>word[1] = 0x00000000; /* Src address is internal */ 

*ptr = (long) p[5]; 

#ifndef DEBUG 

while((INTPEN & 0x10000)==0x00); /* poll PP0 */ 

INTPEN = 0x10000; /* clear the interrupt flag */ 

#endif 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from internal to DRAM */ 

i = i + 1; 
i - i - 1; 

while(PKTREQ & 0x02); 

p [ 5]->word [2 ] = p [5]-*>word[2] + jump_col [lp] ;/* Dst address is DRAM 

V 

#ifndef DEBUG 

while((INTPEN & 0x20000)==0x00); /* poll PP1 */ 

INTPEN = 0x20000; /* clear the interrupt flag */ 

tendif 

PKTREQ != MP_PKTREQ_P_BIT; /* kick off transfer from internal to DRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

p[5]->word[2] = p[5]->word[2] + jump_col[lp];/* Dst address is DRAM 

*/ 

#ifndef DEBUG 

while((INTPEN & 0x40000)==0x00); /* poll PP2 */ 

INTPEN = 0x40000; /* clear the interrupt flag */ 

#endif 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from internal to DRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

p[5]->word[2] = p[5]->word[2] + jump_col[lp];/* Dst address is DRAM 

*/ 
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#ifndef DEBUG 

while((INTPEN & 0x80000)==0x00); /* poll PP3 */ 

INTPEN ~ 0x80000; /* clear the interrupt flag */ 

#endif 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from internal to DRAM */ 

i = i + 1; 
i - i - 1; 

while(PKTREQ & 0x02); 

p[5]->word[2] = p[5]->word[2j + jump_col[lp];/* Dst address is DRAM 

*/ 


} 


} /* end for lp */ 

#ifdef OLD_DISPLAY 

/* Display the coefficients */ 


p[6]->word[1] 

p[6]->word[3] 
p[6]->word[4] 
p[6]->word[5] 
p[6]->word[7] 
p[6]->word[9] 

p[8]->word[2] 


0x80320000; 

0x00001000; 
= 0x00001000; 
= 0x00000000; 

- 0x00000000; 

- 0x00001000; 

‘ 0xb4080000; 


for (i-0;i<16;i++) 

{ 

*ptr = (long) p [6]; 
p[6]->word[2] = 0x00000000; 


/* Src address is DRAM 


*/ 


/* Dst address is VRAM */ 


/* Dst address is internal */ 


*PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from DRAM to internal */ 


i = i + 1; 
i = i - 1; 


while(PKTREQ & 0x02); 

command(0x00002001); /* send msg interrupt to PP0 */ 

p[6]->word[1] - p [6]->word[l] + jump_row[0j; 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal */ 

i « i + 1; 

i = i - 1; 

while(PKTREQ & 0x02); 

command(0x00002002); /* send msg interrupt to PP1 */ 
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p[6]->word[1] = p[6]->word[1] + jump_row[0]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal */ 

i = i + 1; 
i * i - 1; 

while(PKTREQ & 0x02); 

command(0x00002004); /* send msg interrupt to PP2 */ 

p[6] ->word[1] = p[6]->word[l] + jump_row[0]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* kick off transfer from SDRAM to internal */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

command(0x00002008); /* send msg interrupt to PP3 */ 

p[6]->word[1] = p[6]->word[l] + jump_row[0]; 

while((INTPEN & 0x10000)==0x00) ; /* poll PP0 */ 

INTPEN = 0x10000; /* clear the interrupt flag */ 

*ptr = (long) p[8]; 

p[8]->word[1] = 0x00000000; /* Src address is internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to VRAM */ 

i - i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

while((INTPEN & 0x20000)==0x00) ; /* poll PP1 */ 

INTPEN = 0x20000; /* clear the interrupt flag */ 

PKTREQ |= MP_PKTREQ_P_BIT; I* start xfer from internal to VRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

while((INTPEN & 0x40000)==0x00); /* poll PP2 */ 

INTPEN = 0x40000; /* clear the interrupt flag */ 

PKTREQ I = MP_PKTREQ__P_BIT; /* start xfer from internal to VRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

while((INTPEN & 0x80000)==0x00); /* poll PP3 */ 

INTPEN = 0x80000; /* clear the interrupt flag */ 

PKTREQ |= MP PKTREQ P BIT; /* start xfer from internal to VRAM */ 
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i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

} 

#endif 

#endif 

*pp_stop_encode = 0; 
#ifdef ENCODE_STREAM 


#ifdef HOST 

/* Tell host where bit stream can be found */ 

ReturnVal = CilWriteMailbox(0,stream_addr[flip] ) ; 

if (ReturnVal != CIL_OK) 

{ 

/*** Cannot write to mailbox ***/ 
while (1); 
return; 

} 

#endif 


/* DRAM (rows and 8 bit data) -> internal RAM */ 

/* Can be used 4 times then change dst <- this can be done 8 times */■ 


p[3]->word[0] 

= 

0x80000002; 

/* 

Contig. Mem 

to 

int w/s update 

*/ 

p[3]->word[1] 

= 

0x00000000; 

/* 

Src 

address 

is 

DRAM 

*/ 

p[3]->word[2] 


0x80420000; 

/* 

Dst 

address 

is 

internal 

*/ 

p[3]->word[3] 


OxOOOOOcOO; 

/* 

Src 

B count 

Src 

A count 

*/ 

p[3]->word[4] 


OxOOOOOCOO; 

/* 

Dst 

B count 

Dst 

A count 

*/ 

p[3]“>word[5] 

= 

0x00; 

/* 

Src 

C count 



*/ 

p[3]->word[6] 

= 

0; 

/* 

Dst 

C count 



*/ 

p[3]->word[7] 

= 

0x0000; 

/* 

Src 

B pitch 



*/ 

p[3]->word[8] 


0x000; 

/* 

Dst 

B pitch 



*/ 

p[3]->word[9] 

= 

0x1000; 

/* 

Src 

C pitch 



*/ 

p[3]->word[10] 

= 

= OxOCOO; 

/* 

Dst 

C pitch 



*/ 


/* internal RAM -> DRAM (rows and 16 bit data) */ 

/* Can be used 4 times then change src then this can be repeated 8 times */ 


p [7] ->word [0] = 0x80000202; /* int to DRAM w/dst updt */ 

p[7]->word[ 1] = 0x00000000; /* Dst address is SDRAM */ 

p[7]->word[2] » 0x80400000; /* Dst address is SDRAM */ 

p[7]->word[3] = OxOOOOOcOO; /* Src B count Src A count */ 

p [7]->word [4] = OxOOOOOcOO; /* Dst B count Dst A count */ 

p[7]->word[6] = 0x00000000; 

p[7j->word[8] = 0x00000000; 

p [7] ->word [9] = 0x1000; /* Src C pitch */ 

p[7]->word[10] - OxOcOO; /* Dst C pitch */ 
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p [9]->word[2] = stream_addr[flip] +8/ /* Dst address is 

external SDRAM */ 

/* DRAM (rows and 16 bit data) -> internal RAM */ 

/* Can be used 4 times then change dst then this can be repeated 8 times */ 


p[6]->word[0] 


0x80000200; 

/* 

Contig. Mem 

to 

int w/dst 

updt */ 

p [ 6]->word[1] 

= 

0x80420000; 

/* 

Src 

address 

is 

DRAM 

*/ 

p [6]->word[2] 

= 

0x00000000; 

/* 

Dst 

address 

is 

internal 

*/ 

p [ 6]->word[3] 

- 

0x00000400; 

/* 

Src 

B count 

Src 

A count 

*/ 

p[6]->word[4] 


0x00000400; 

/* 

Dst 

B count 

Dst 

A count 

*/ 

p [6]->word(5] 


0x00; 

/* 

Src 

C count 



*/ 

p[6]->word[6] 

= 

0; 

/* 

Dst 

C count 



*/ 

p[6]->word[7] 

= 

0x0000; 

/* 

Src 

B pitch 



*/ 

p[6]->word[8] 

= 

0x000; 

/* 

Dst 

B pitch 



*/ 

p[6]->word[9] 

= 

0x0400; 

/* 

Src 

C pitch 



*/ 

p[6]->word[10] 

S 

= 0x0000; 

/* 

Dst 

C pitch 



*/ 


for (currentjDlock=0;current_block<40;current_block = current_block + 4) 
{ 

for (j=0;j<3;j++) 

{ 

for (i=0;i<4;i++) 

{ 

if (j!=0) 


while((INTPEN Sc (1« (16+i) ) ) ==0x00) ; /* poll PP for finished block 

transformations */ 

INTPEN = 0xl0000«i; /* clear PP interrrupt */ 

} 

p [1]->word[1] = 0x80320000 + ((current_block+i)>>3)*0x04000 + 

((current_block+i)&7) *64 + 0x14000*j; /* Src for block changes */ 

p [1]->word[2] = 0x00008000 + (i«12); /* Dst address is 

internal RAM2 */ 

*ptr = (long) p[1]; 

/* kick off 1024 byte (32x16 words) coefficient block transfer from 
SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

p [ 1]->word[1] = p[1]->word[1] + 0x200; /* Src for block 

changes */ 

p [ 13->word[2] = p[1]->word[2] + 0x400; /* Dst address is 

internal DATA RAM2 */ 

/* kick off 1024 byte (32x16 words) coefficient block transfer from 
SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

command(0x00002000 + (1<<i)); /* send msg interrupt to PP that will 

process these 2 blocks */ 

} /* end for i */ 

} /* end for j */ 

/* Move Buffer (0x80420000) to internal memory (2 parts) */ 

for (i=0;i<4;i+ + ) 

{ 
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while((INTPEN & (1« (16+i) ) ) —0x00) ; /* poll PP for finished block 

transformations */ 

INTPEN = OxlOOOCKci; /* clear PP interrrupt */ 

/* Move buffer in (two steps required) */ 

p[6]->word[2] = OxcOO + i*0xl000; /*Dst is DATA RAM 1 (half way 

thru) */ 

p[6]->word[3] « p[6]->word[4] = p [6]->word[9] - 0x400; /* length is 1024 

bytes */ 

*ptr = (long) p[6]; 

/* kick off 1024 byte (2x256x1 words) coefficient block transfer from 
SDRAM to internal */ 

PKTREQ |= MP_PKTREQ__P_BIT; i = i + 1; i - i - 1; while(PKTREQ & 0x02); 

p[6]->word[2] = 0x8000 + i*0xl000; /* Dst is DATA RAM 2 */ 

p[6]->word[3] = p[6]->word[4] = p [6]->word[9] - 0x800; /* length is 2048 

bytes */ 

/* kick off 2048 byte (4x256x1 words) coefficient *block transfer from 
SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

command(0x00002000 + (l«i)); /* send msg interrupt to PP to difference 

values and start Quantization */ 

} 

/* Move internal memory to buffer */ 

p[7]->word[1] = 0x0000; /* Src for block changes */ 

*ptr = (long) p[7]; 

for (i=0;i<4;i++) 

{ 

while ( (INTPEN & (l«(16+i) ) )“0x00) ; /* poll PP for finished block 

transformations */ 

INTPEN • 0xl0000«i; /* clear PP interrrupt */ 

/* kick off 3072 byte (6x256x1 words) coefficient block transfer from 
internal to 0x80400000 */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

} 

} /* end for current_block */ 


/* Read back local maxvals and compute global maxval for all PP f s */ 


temp__maxval » NOCACHE_INT (local_maxval [0] ) ; 


if (NOCACHE_INT(1ocal_maxva1[1] ) 
NOCACHE_INT(local_maxval[1]); 
if (NOCACHE_INT(1ocal_maxva1[2]) 
NOCACHE_INT(local__maxval[2] ) ; 
if (NOCACHE_INT(1ocal_maxval[3] ) 
NOCACHE INT(local maxval[3]); 


> temp_maxval) 

> temp_maxval) 

> temp_maxva1) 


temp_maxva1 
temp_maxva1 
temp_maxva1 


i=0; 

while(temp_maxval!=1) 

{ 
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i++; 

temp_maxval = temp_maxval»l; 

} 

temp_maxval = l<<i; 

NOCACHEJTOT (globaljmaxval) = i? 

command (0x000020OF) ; /* send msg interrupt to PP0,1,2,3 */ 

/* Continue with Quantization and differencing */ 
p[6]->word[1] = 0x80420000; 

p[3]->word[2] = 0x80420000; /* Src for block changes */ 

p[7]->word[1] = 0x80400000; 
p[7]->word[10] - 0x0000; 

for (current_block=0;current_block<40;current_block = current_block + 4) 

{ 

/* Write coefficients back into internal memory (0x0000) from external buffer 
(0x80400000) and kick off Quant. */ 

save_p7_src = p[7]->word[1]; 

for (i=0;i<4;i++) 

{ 

p[7]->word[2] = i*0xX000; /*Dst is DATA RAM 1 (half way thru) */ 

p[7]->word[3] = p[7]->word[4] = p[7]->word[9] ® OxcOO; /* length is 3072 
bytes */ 

*ptr = (long) p [7]; 

/* kick off 1024 byte (2x256x1 words) coefficient block transfer from 
SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P__BIT; i « i + 1; i « i - 1; while (PKTREQ & 0x02); 

* (unsigned short *) (0x01000600 + (i«12) ) = local_alloc [current_block+i] ; 

command (0x00002000 + (l«i)); /* send msg interrupt to PP to start 

Quantization */ 

} 


p[9]->word[1] = 0x01000630; /* Src address is internal Parameter RAM 

*/ 

p[7]->word[l] = save_p7_src; 


for (i-0;i<4;i++) 

{ 

/* Read the number of bytes that need to be transferred */ 

p[9]->word[3] - p[9]->word[4] = p[9]->word[10] = 

local_alloc[current_block+i]; 

*ptr = (long) p [9]; 

while ( (INTPEN & (1« (16+i) ) ) ==0x00) ; /* poll PP for finished 

Quantization */ 

INTPEN = 0xl0000«i; /* clear PP interrrupt */ 
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/* kick off transfer from internal to SDRAM */ 

PKTREQ |= MP__PKTREQ_P_BIT ; i = i + 1; i - i - 1; while ( PKTREQ & 0x02); 

/* Move coefficients in (two steps required) */ 

p[7]->word[2] = OxcOO + i*0xl000; /*Dst is DATA RAM 1 (half way 

thru) */ 

p[7]->word[3] - p[7]->word[4] = p[7]->word[9] = 0x400; I* length is 1024 

bytes */ 

*ptr = (long) p [7] ; 

/* kick off 1024 byte (2x256x1 words) coefficient block transfer from 
SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

p[7]->word[2] = 0x8000 + i*0xl000; /* Dst is DATA RAM 2 */ 

p[7]->word[3] = p[7]->word[4] = p[7]->word[9] - 0x800; /* length is 2048 

bytes */ 

/* kick off 2048 byte (4x256x1 words) coefficient block transfer from 
SDRAM to internal */ 

PKTREQ I= MP_PKTREQ_P_BIT ; i ‘ = i + 1; i - i - 1; while( PKTREQ & 0x02); 

command(0x00002000 + (l«i)) ; /* send msg interrupt to PP to special 

difference values */ 

} 

/* Move buffer to internal memory (2 parts) */ 

for (i-0;i<4;i++) 

{ 

while ( (INTPEN & (l«(16+i) ) )-«0x00) ; /* poll PP for finished block 

transformations */ 

INTPEN = 0xl0000<<i; /* clear PP interrrupt */ 

/* Move coefficients from buffer in */ 

p[6]->word[2] = i*0x1000 + OxcOO; /*Dst is DATA RAM 1 */ 

p[6]->word[3] = p[6]->word[4] « p[6]->word[9] ~ 0x400; /* length is 1024 

bytes */ 

*ptr = (long) p[6]; 

/* kick off 1024 byte (2x256x1 words) coefficient block transfer from 
SDRAM to internal */ 

PKTREQ |= MP_PKTREQ_P_BIT; i - i + 1; i = i - 1; while( PKTREQ & 0x02); 

p[6]->word[2] = i*0xl000 + 0x8000; /*Dst is DATA RAM 2 */ 

p[6]->word[3] = p[6]->word[4] = p[6]->word[9] = 0x800; /* length is 2048 

bytes */ 

*ptr = (long) p[6] ; 

/* kick off 1024 byte (2x256x1 words) coefficient block transfer from 
SDRAM to internal */ 

PKTREQ I= MP_PKTREQ_P_BIT; i - i + 1; i « i - 1; while(PKTREQ & 0x02); 
command(0x00002000 + (l«i)); /* send msg interrupt to PP to sum values 

*/ 

} 

/* Move internal memory to buffer */ 

*ptr = (long) p[3]; 
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for (i=0;i<4;i++) 

{ 

while((INTPEN & (1<< (16+i) ) ) ==0x00) ; /* poll PP for finished block 

transformations */ 

INTPEN = 0xl0000«i/ /* clear PP interrrupt */ 

p[3]->word[1] = i*0xl000; /* Src for block changes */ 

/* kick off 3072 byte (6x256x1 words) coefficient block transfer from 
internal to SDRAM */ 

PKTREQ |= MP_PKTREQ_P_BIT; i = i + 1; i = i - 1; while(PKTREQ & 0x02); 

) 

} 


• I* internal RAM -> DRAM (rows and 16 bit data) */ 

/* Can be used 4 times then change src then this can be repeated 8 times */ 


p[7]->word[0] = 0x80000200; /* int to DRAM w/src updt */ 
p[7]->word[1] = 0x00000000; /* Src address is internal */ 
p[7]->word[2] = 0x80320000; /* Dst address is DRAM */ 
p[7]->word[3] = 0x00001000; /* Src B count Src A count */ 
p[7]->word[4] = 0x00001000; /* Dst B count Dst A count */ 
p[7]->word[9] = 0x1000; /* Src C pitch */ 
p[7]->word[10] = 0x0000; /* Dst C pitch */ 


/* DRAM (rows and 16 bit data) -> internal RAM */ 

/* Can be used 4 times then change dst then this can be repeated 8 times */ 


p[6]->word[0] 


0x80000002; 

/* 

Contig. Mem 

to 

int w/dst 

updt */ 

p[6]->word[1] 

- 

0x80320000; 

/* 

Src 

address 

is 

DRAM 

*/ 

p[6]->word[2] 

= 

0x00000000; 

/* 

Dst 

address 

is 

internal 

*/ 

p[6]->word[3] 

= 

0x00001000; 

/* 

Src 

B count 

Src 

A count 

*/ 

p[6]->word[4] 

- 

0x00001000; 

/* 

Dst 

B count 

Dst 

A count 

*/ 

p[6]->word[5] 

= 

0x00; 

/* 

Src 

C count 



*/ 

p[6]->word[6] 

= 

0; 

/* 

Dst 

C count 



*/ 

p[6]->word[7] 

= 

0x0000; 

/* 

Src 

B pitch 



*/ 

p[6]->word[8] 

= 

0x000; 

/* 

Dst 

B pitch 



*/ 

p[6]->word[93 

= 

0x0000; 

/* 

Src 

C pitch 



*/ 

p[63->word[10] 

= 

* 0x1000; 

/* 

Dst 

C pitch 



*/ 


/* DRAM (rows and 8 bit data) -*> internal RAM */ 

/* Can be used 4 times then change dst <- this can be done 8 times */ 


p[33->word(03 

= 

0x80000202; 

/* 

Contig. Mem 

to 

int w/s update 

*/ 

p[33->word[13 

= 

0x80300000; 

/* 

Src 

address 

is 

DRAM 

*/ 

p[33->word[2] 

= 

0x00000000; 

/* 

Dst 

address 

is 

internal 

*/ 

p[33->word[33 

= 

0x00000800; 

/* 

Src 

B count 

Src 

A count 

*/ 

p[3]->word[4 3 

= 

0x00000800; 

/* 

Dst 

B count 

Dst 

A count 

*/ 

p[3]->word[53 

= 

0x00; 

/* 

Src 

C count 



*/ 

p[3)->word[6) 

= 

0; 

/* 

Dst 

C count 



*/ 

p[3]->word[7] 

= 

0x0000; 

/* 

Src 

B pitch 



*/ 

p[33->word[8] 

= 

0x000; 

/* 

Dst 

B pitch 



*/ 

p[33->word[93 


0x0800; 

/* 

Src 

C pitch 



*/ 

p[33->word[103 

= 

= 0x1000; 

/* 

Dst 

C pitch 



*/ 
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ppstarttime = TCOUNT; 

*pp_stop_encode = 1; 

command (Ox0000200F) ; /* send msg interrupt to all PPs */ 


#endif 


/* Write dynamic values to the bitstream */ 
temp_ulong = 0; 

temp_ulong = temp_ulong | (tbuf_pp0[0] « 24); 
temp_ulong = temp__ulong | (global_mean « 16); 
temp_ulong - temp_ulong ! (global_maxval « 8) ; 

NOCACHE_INT (* (UINT32 *) stream^addr[flip]) = temp_ulong; 

temp_ulong = *compression_ratio; 

#ifndef variable__compression 

NOCACHE_INT (* (UINT32 *) (stream_addr[flip] + 4)) * temp_ulong & OxOOOOOOff; 
#else 

NOCACHE_INT(* (UINT32 *) (stream_addr [ f lip] + 4)) = temp_ulong; 

#endif 

flip = 1 - flip; 

#ifdef HOST 

ReturnVal = CilRaiseSignalNumber(FrameDoneBit) ; 
if (ReturnVal != CILJDK) 

{ 

/*** Cannot raise signal ***/ 
return; 

} 

#endif 

proc_time - Oxffffffff - TCOUNT; 

#ifdef SHOWDISPLAY 

if (Semaphore -= 0) 

{ 

p[7]->word[3] = 0x00001000; 

p[7]->word[4] = 0x00001000; 

p[7]->word[6] = 0x00000000; 

p[7]->word[8] = 0x00000000; 

p[7]->word[10] = 0x00001000; 


/* DRAM (8 bit data) -> VRAM (raw image) */ 


p[2]->link = p[2]; /* point to next PT */ 
p[2]->word[0] = 0x80000000; /* linear to VRAM */ 
p[2] ->word[1] = 0x80300000; /* Src address is DRAM */ 
p[2]->word[2] = 0xb4000000; /* Dst address is VRAM */ 
p[2]->word[3] = 0x00110010; /* Src B count Src A count */ 
p[2]->word[4] = 0x00110010; /* Dst B count Dst A count */ 
p[2]->word[5] « 0x00; /* Src C count */ 
p [2] ->word [6] = 0; /* Dst C count ■*/ 
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p[2]->word[7] 

- OxbO; 

/* 

Src B pitch 


*/ 

p[2]->word[8] 

= 0x400; 

/* 

Dst B pitch 


*/ 

p[2]->word[9] 

= 0x00; 

/* 

Src C pitch 


*/ 

p[2]->word[10] 

= 0x0000; 

/* 

Dst C pitch 


*/ 

p[2]->word[11] 

= 0; 

/* 

Src transparency 

upper 

*/ 

p[2]->word[12] 

= 0; 

/* 

Src transparency 

lower 

*/ 

p[2]->word[13] 

- 0; 

/* 

Reserved 


*/ 

p[2]->word[14] 

= 0; 

/* 

Reserved 


*/ 


temp_time = 1.0/(encode_time*0.000000025); 

/* temp_time = encode_time*0.000025; */ 

sprintf(stringl,"%f",temp_time); 

for (i=0;i<4;i++) 

{ 

if (stringl[i]!-'. 1 ) 

{ 

temp_str = stringl[i]; 
temp_str = temp_str - 48; 
temp_str « temp_str * 16; 

} 

else 

{ 

temp_str = 160; 

} 

p[2]->word[1] = (long)&number_pixels; 
p[2]->word[1] += temp_str; 
p[2]->word[2] = 0xb4080000 + i*16; 

*ptr - (long) p[2]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to VRAM */ 

i = i + 1; 
i = i - 1; 

while(PKTREQ & 0x02); 

} 


temp_time = proc_time*0.000025; 

sprintf (stringl, "If", temp__time) ; 

for (i=0;i<4;i++) 

{ 

if (stringl[i]!='.') 

{ 

temp_str = stringl[i); 
temp_str ■ temp_str - 48; 
tempostr = temp_str * 16; 

} 

else 

{ 

tempostr = 160; 

} 

p[2]->word[1] = (long)&number_pixels; 
p[2]->word[l] += temp_str; 
p[2]->word[2] = 0xb4084800 + i*16; 
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*ptr « (long) p[2]; 

PKTREQ |= MP_PKTREQ_P_BIT; /* start xfer from internal to VRAM */ 

i = i + 1; 
i - i - 1; 

while(PKTREQ & 0x02); 

} 


/* DRAM (8 bit data) -> VRAM (raw image) */ 


P[2]->link = p [2]; 

/* 

point to next PT 

*/ 

p [2] ->word[0] 

= 0x80000000, 

/* 

linear to VRAM 

*/ 

p [2]->word[1] 

= 0x80300000, 

/* 

Src address is DRAM 

*/ 

p [2]“>word[2] 

= 0xb4000000; 

/* 

Dst address is VRAM 

*/ 

p[2]->word[3] 

= 0x00038000; 

/* 

Src B count Src A count 

*/ 

p[2]->word[4] 

- OxOOff0200; 

/* 

Dst B count Dst A count 

*/ 

p [2]->word[5] 

= 0x00; 

/* 

Src C count 

*/ 

p [2]->word[6] 

“ 0; 

/* 

Dst C count 

*/ 

p[2]->word[7] 

= 0x8000; 

/* 

Src B pitch 

*/ 

p [2]->word[8] 

= 0x800; 

/* 

Dst B pitch 

*/ 

p[2]->word[9] 

= 0x00; 

/* 

Src C pitch 

*/ 

p[2]->word[10] 

= 0x0000; 

/* 

Dst C pitch 

*/ 

p [2]->word[11] 

= 0; 

/* 

Src transparency upper 

* / 

p [2]->word[12] 

= 0; 

/* 

Src transparency lower 

*/ 

p [2]->word[13] 

= 0; 

/* 

Reserved 

*/ 

p[2]->word[14] 

} 

#endif 

} /* end while */ 

- 0; 

/* 

Reserved 

*/ 


} /* end task */ 

extern int ep_runpp0; 

main () 

{ 

int i; 

unsigned int temp; 

unsigned int *src_ptr = (unsigned int *)0x90080000; 
unsigned int *dst_ptr = (unsigned int *)0x80000000; 

/* REFCNTL = 0xffff0138; */ /* setup up dram and sdram to correct refresh rate 
for 40 Mhz C80*/ 

REFCNTL = 0xffff0186; /* setup up dram and sdram to correct refresh rate for 50 
Mhz C80*/ 


command(OxcOOOOOOf); /* reset and halt PP0,1/2,3 */ 

Mint *)0x010001b8 = (int)&ep_runpp0; /* initialize task vector */ 
Mint *)0x010011b8 = (int)&ep_runpp0; /* initialize task vector */ 
Mint *)0x010021b8 = (int)&ep_runpp0; /* initialize task vector */ 
Mint *)0x010031b8 = (int)&ep_runpp0; /* initialize task vector */ 


/* upload PP code */ 

/* memcpy( 0x80000000, 0x90200000, 0x9020bae0 - 0x90200000); */ 


B-246 



NAWCWD TP 8442 


for (1=0;i<34000;i++) 

{ 

temp = *srcj?tr++; 

*dst_ptr++ = temp; 

} 

for (1=0;i<20000;i++) 

{ 

temp = temp + 1; 
temp = temp -1; 

} • 

command(0x3000000f); /* start PP0,1,2,3 by unhalting it */ 

/* all will take its task interrupt*/ 


/* Basic init functions */ 


#ifdef HOST 

Interruptlnit (); /* Init ME interrupts */ 

#endif 

/* Basic init functions */ 


PtReqlnit (); 
TasklnitTasking (); 
IclInstallPtdMalloc(); 
IclPTInit (15); 


/* Init the ME PT functions */ 

/* Init tasking */ 

/* Install protected malloc and free function to ME */ 
/* Init the Icl PT server task with a priority of 15 


#ifdef HOST 
/* 

** Initialise the Cil 

** Declare 4 buffers of 256 bytes each. 

** These buffers are not used here - choose minimum sizes. 
*/ 

Cillnit (4,256); 

#endif 


TaskResume(TaskCreate (-1,task, NULL, 14, 4096)); /* Start task */ 


while(1==1); /* loop */ 

} 


/****************************************************************•*** 
★ * 

* Function : SignalHandler * 

* Args : UINT32 Signals * 

* * 


* Description: 

* Signals Signals raised by host 

* 


* 


* 


* SignalHandler will be called when host raises signal ** * 

* Return Values: * 

* None * 

* * 




★ * 


/ 


void 

SignalHandler(UINT32 Signals) 

{ 

if ((Signals & HostRequestBitMask)!=0) 

{ 
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/* 

** Host has requested a block of data 
*/ 

/* TaskSignalSema(Semaphore); */ 

Semaphore = 1; 

} 

/* 

** Ignore any other signals raised - they’re not for us 
*/ 


/* Initialize the adaptive bit allocation tables on the PPs * 
void 

init_alloc_table (UINT3.2 index) 

{ 

int t_alloc; 
int p_alloc; 
int r_alloc; 
int k,lp; 

t_alloc - YSIZE*XSIZE/comp_table[index] - headersize; 
p_alloc = t_alloc/((AY*AX)/6); 
r_alloc = t_alloc%((AY*AX)/6); 

for(k=0;k<((AY*AX)/6);k++) 
local_alloc[k] = p_alloc; 


lp = 0/ 

while (r_alloc>0) 

{ 

local_alloc[lp++] + +; 
r_alloc—; 

} 


} 

void 

init_alloc_table2(UINT32 index) 

{ 

int t_alloc; 
int p_alloc; 
int r_alloc; 
iht k, lp; 

int bytes_per_frame; 

bytes_per_frame = index; 

t_alloc = bytes_per_frame - headersize; 
p_alloc = t_alloc/((AY*AX)/6); 
r_alloc = t_alloc%((AY*AX)/6); 

for(k=0;k<((AY*AX)/6);k++) 
local_alloc[k] = p_alloc; 


lp = 0; 

while (r_alloc>0) 

{ 

local_alloc[lp++] ++; 
r_alloc—; 

} 


} 
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/*******************************★************★***************************** 


* Description: * 

* * 

* PCI/C80 MP linker file * 

* * 


★★★★★★★★★a****************************************************************/ 

-c 

-heap 0x100000 
-stack 0x10000 
-1 mp_cio.lib 

-1 \pcic80\lib\mp_task.lib 
-1 mp_int.lib 
-1 mp_rts.lib 
-1 mpjotreq.lib 
-1 mp_ppcmd.lib 
-1 ppcmd.lib 
-1 \pcic80\lib\icl.lib 
-1 \pcic80\lib\vol.lib 
-1 \pcic80\lib\bgl.lib 
-1 \pcic80\lib\vil24.lib 
-1 \pcic80\lib\cil.lib 


MEMORY 

{ 

RAMO : o-OxOOOOOOOO 1 = 0x00800 
RAMI: o=0x00000800 1 = 0x00800 

RAM2 : o=0x00008000 1 = 0x00800 

RESERV : o=0x01000000 1 = 0x00200 

MPPRAM1 : o=0x01010580 1 = 0x00280 

MPPRAM : o=0x010007D8 1 = 0x00028 

SDRAM : o=0x80000000 1 = 0x300000 

DRAM: o=0x90000000 1 = 0x80000 

DRAM2 : o=0x90080000 1 = 0x100000 

UNINIT : o=0x90180000 1 = 0x180000 

IMAGE : o=0x80300000 1 = 0x80000 

SPOT : o=0x80380000 1 = 0x10000 

VRAM_PAL : o=0xB0000000 1 = 0x200000 

VRAM_VGA : o=0xB4000000 1 = 0x400000 

} 


SECTIONS 

{ 

/* 

* The following section must be defined for all program that 

* use the CIL. The section must appear in the first 8Mb 

* of DRAM and must be long enough to include all buffers 

* plus 128 bytes. This example is big enough for 4*256byte 

* buffers. 

* 

* See the user guide for more information. 

*/ 

.lsidram : { 

_CilDRAMBase = .; 

. += 0x600; 

} > DRAM 


.text : > DRAM 
.cinit : > DRAM 
.const : > DRAM 
.switch : > DRAM 
.data : > DRAM 
.bss : > DRAM 
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.cio : 
.pcinit 

.ptext 

font : 

.sysmem 
.stack 
sh^vars 
mp_vars 

rawimage 

stream: 


> DRAM 
: > DRAM 

: load > DRAM2, run SDRAM 

load > DRAM2, run SDRAM 

: > UNINIT 

: > UNINIT 

: > MPPRAM 

: > MPPRAM1 

: > IMAGE 

> SPOT 
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/★a************************************************************************ 


* * 

* Description: * 

* * 

* PCI/C80 PP linker file * 

* * 


**************************************************************************/ 

-pc 

-1 d:\mvp\src\newlib\pp_rts.lib 

-pstack 192 

MEMORY 

{ 

RAMO : o=0x00000000 1 - 0x00800 

RAMI : o=0x00000800 1 = 0x00800 

RAM2 : o=0x00008000 1 = 0x00800 

RESERV : o=0x01000000 1 - 0x00200 

PRAM: o=0x01000200 1 = 0x00600 

SDRAM : o=0x80000000 1 - 0x800000 

DRAM: o=0x90400000 1 = 0x400000 

VRAM_PAL : o=0xB0000000 1 = 0x200000 

VRAM VGA : o=0xB4000000 1 = 0x400000 


SECTIONS 

{ 


} 


. text 

> DRAM 


.ptext 

> DRAM 


.cinit 

> DRAM 


.const 

> DRAM 


.switch 

> DRAM 


.data 

> DRAM 


.bss : > DRAM 


.cio : > DRAM 


.sysmem 

> DRAM 


.stack 

> DRAM 


.pcinit : 

> DRAM 


.pbss : 

(PASS) > 

PRAM 

.psysmem: 

(PASS) > 

PRAM 

.pstack : 

(PASS) > 

PRAM 


B-251 




NAWCWD TP 8442 


★ * 

** subpass.s (PP Program) 

* * 

** Written by Jim Witham, Code 472300D, 939-3599 
* * 

** File that contains the following assembly language subroutine: 

* * 

** $subpass, $encode_symbol2, $update_model2, $bit_plus_follow2, I_DIV_JW2, 

** $new_output_bit2 
* * 

** The 2 at the end of the subroutine name indicates that these subroutines 
** were copied here from another file and called locally to not have cache 

** faults. 

* * 

***************************************************************************** j 

.global $subpass 

.global $stats_flag 
.global $char_to_index 
$char_to_index) 

.global $stats_val 
.global $BITE 
.global $STOP 

.global $sym_index 
.global $sym_array 

.global $list 
.global $list_index 

.global $bit_index 

.global $index_to_char 
.global $getaway_address 
.global $quick_getaway 
.global $byte_stream 
.global $No_of_symbols 
.global $bits_to_follow 
.global $T_BYTES 
.global $high 
.global $freq 
.global $low 
.global $cum_freq 

.global end_subpass3 


tempdl 

.setd4 

tempd2 

.setd2 

tempd3 

.setd5 

tempd4 

.setdl 

m 

set d6 

tt 

set d7 

list 

set al2 

stats_val 

.seta4 

i 

set 20 

j 

set 24 

t 

set 28 


/unsigned char pointer * (xba + $stats__f lag) 

/unsigned char pointer &*(xba + 

/signed short pointer * (xba + $stats_val) 

/signed int sw * (xba + $BITE) 

/signed int sw *(xba + $STOP) 
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.align 512 


/*************************************************************★***** 


* * 

* Function : $subpass * 

* Args : t * 

* Passed in : dl * 

* * 

* Description: * 

* t - the current THRESH » BITE value. * 

* * 

* $subpass will be called to perform a subordinate pass over * 

* the coefficients. * 

* * 

* Return Values: * 

* none * 

* * 


★★★a***************************************************************/ 


$subpass: 


dO - &* (sp —= 32) 

* (sp + 12) -w iprs 

*(sp + 8) =w d6 
*(sp + 16) =w a4 

*(sp + 4) =w al2 
*(sp + 0) =w d7 


* (sp + t) - dl 

list =uw *(xba + $list) 

tempdl =uh *(xba + $list_index) ;repeat loop list_index times 

mloop: 

leO = firstloop 
IrsO = tempdl - 1 
nop 

tempd2 =uh *(list++) 

stats_val =uw *(xba + $stats_val) 

tempdl =uw *(xba + $BITE) 
tt = *(sp + t) 

I| tempd2 = tempd2<<! 
lei = endjloop 
lrsl - tempdl - 1 


stats_val = stats_val + tempd2 
tt = tt »u 1 


jloop: 
;»» 

;»>> 


sym - (stats_val[(s*768)+i]>0)&1; 
stats_val[(s*768)+i] -= ( (sym>0) ?1:-1) *t/2; 


tempd2 =sh *(stats__val) 

I | tempdl = tt 

tempd2 = tempd2 - 0 

tempd4 =(.nvz] 1 || tempd4 *[le.nvzj zero 
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tempdl =[le] -tt 

tempd2 = tempd2 - tempdl 
II xl = tempd4 

*(stats_val) =sh tempd2 

al = &* (xba + $char_to_index) 

call = $encode_symbol2 
d6 =ub *(al + xl) 
dl = d6 

call = $updatejnodel2 
nop 

dl = d6 

dO =sw * (xba + $STOP) 

dO = dO - 0 

br =[ne] done 

nop 

nop 

endjloop: 

tt = tt >>u 1 

firstloop: 

tempd2 =uh * (list++) 
done: 

al2 =sw * (sp + 4) 
a4 =sw *(sp + 16) 
br = *(sp + 12) 

d6 =sw *(sp + 8) 

|| d7 =sw *(sp + 0) 
dO = Sc* (sp ++- 32) 

; branch occurs here 



Function : encode_symbol2 * 

Args : none * 

* 


Description: * 

encode_symbol will be called to perform the arithmetic * 

encoding of a symbol. This routine was lifted from a C * 
compiled program and included here, for cache coherency. * 


* * 

* Return Values: * 

* None * 

* * 




$encode_symbol2: 

xO = dl 

aO = Sc* (xba + $cum_freq) 
d3 =sw *(xba + Slow) 
d2 =sw *(xba + $high) 
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dl 


xO 

« 1 

d4 

— 

d2 

- d3 


d2 

=g 

aO 

al 

s= 

dl 

+ d2 


dO 

- i 

i* (Sp 

d4 

= 

d4 

+ 1 


II *(sp) =w iprs 

dl =uhl d4 

|| d5 =sh *(al - 2) 

dl =u d5 * dl 
I| d2 =uhl d5 

d2 =u d2 * d4 

dl =u d5 * d4 

M d2 = d2 + dl 

call = I_DIV_JW 

dl * dl + (d2 « 16) 

|| xl =sh *aO 

d2 = xl 

dO -uhl d4 

| | dl =sh *(aO + [xO]) 

dO -u dO * dl 
| | d2 -uhl dl 

d2 =u d4 * d2 

dl —u d4 * dl 

II d2 = dO + d2 

dl = dl + (d2 « 16) 

d4 = d5 + d3 

II d2 = xl 

call = I_DIV_JW 

d4 = d4 - 1 

* {xba + $high) =w d4 

d2 = d5 + d3 
dl = d4 - (1 \\ 8) 
br =[ge] L25 
*(xba + $low) —w d2 
dl =[ge] d2 - (1 \\ 8) 

LI 9: 

call = $bit_plus_follow 

nop 

dl = 0 
br = L23 

dl =sw *(xba + $high) 
dl - dl « 1 

L20: 

dl =sh *(xba + $bits_to_follow) 
d2 - dl + 1 
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|| d3 =sw *(xba + $low) 

*(xba + $bits_to_follow) =h d2 

d2 = d3 - (1 \\ 7) 

II dl =sw *(xba + $high) 

br = L22 

dl - dl - (1 \\ 7) 

I| *{xba + $low) =w d2 

*(xba + $high) =w dl 

L21: 

call = $bit__plus_follow 

nop 

dl * 1 

dl =sw *{xba + $low) 

dl = dl - (1 \\ 8) 

II d2 =sw *(xba + $high) 

dl = d2 - (1 \\ 8) 

I | *(xba + $low) =w dl 

* (xba + $high) =w dl 


L22: 




dl 

= dl « 1 

L23: 




d4 

= dl + 1 


dl =sw *(xba + $low) 



d2 - dl 

« 1 


dl = d4 

- (1 \\ 8) 


br -[It] 

L19 


* (xba + 

$high) =w d4 


* (xba + 

$low) =w d2 

L24: 




dl = d2 

- (1 \\ 8) 

L25: 




br =[ge] 

L21 


nop 



dl -[It] 

d2 - (1 \\ 7) 


br -[It] 

L30 


nop 



dl =[ge] 

d4 - 384 


br -[It] 

L20 


br =[ge] 

L30 


nop 


nop 

L30: 

br = *(sp) 

nop 

dO = &*(sp ++= 4) 


B-256 



NAWCWD TP 8442 


* * 

* Function : I_DIV_JW2 * 

* Args : none * 


Description: 

I_DIV_JW will be called to perform an Integer Divide. 


* Return Values: 

* None 


*****+★******************+*************+************★************•**, 


/ 

;* I__DIV.ASM vl.10 - Integer Divide * 

;* Copyright (c) 1993-1995 Texas Instruments Incorporated * 

.***★**+***************★*****************★*************************★*********** 

f 


; +-+ 

; | i_div.asm = PP assembly program that is used to return a 32-bit I 

; | signed integer quotient from 32-bit signed integer I 

; j division when called by a C program. I 

; I I 

; +-+ 


.global I_DIV_JW2 

; +-+ 

; | 32-bit Signed Integer Word Divide Subroutine : I 

; | o Input 32-bit signed integer Operand 1 is in dl (numerator). | 

; | o Input 32-bit signed integer Operand 2 is in d2 (divisor). I 

; | o Output 32-bi-t signed integer is in d5 (Answer = quotient) . I 

; | o Output 32-bit signed remainder is discarded. I 

; j o 0 input divisor produces 0x80000000 output with overflow set. I 

; | o Quotient - 0x80000000 sets overflow. I 

; | o Number of Stack Words used =3. I 

; I o MF register is saved. I 

f I! 

; | o NOTE: Loop Counter 2 Registers are used but NOT restored ! I 

; + - + 


+- 

I 32 bit / 32 bit ===> 32 bit signed quotient 

I Signed PP Integer Division 

I Numerator / Denominator = Quotient + Remainder (discarded) 

I Divide by 0 produces 80000000 and sets sr(V) 

| Divide Overflow is not possible if Divisor is non-zero, 

| except 80000000/ffffffff = 80000000 will set sr(V). 

i MF register is preserved. I 

+- 


+ 


.ptext 


PP assembly code 


argl: .setdl ; input argument 1 - Numerator (32 low bits) 

arg2: .setd2 ; input argument 2 = Divisor (32 bits) 


ans: 

.setd5 

; answer = 32 bit 

signed quotient 

Div: 

.setd3 

; Input Divisor 


Num: 

.setd4 

; Input high Numerator = 0 

Tmp: 

.setd5 

; ALU output for 

each DIVI 

; 

.align 

8*16; start on a 

16-instruction boundary 
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I DIV JW2: 


Signed Word Integer Divide: Ans = Opl / 0p2 


Div = 0 - | arg2 | 

I | * (sp—[3] ) = Div 
br = [z] Div_By_0 
Num = 0 j 

I | * (sp+[l]) = mf 
I | * (sp+[2]) = Num 
mf = | argl I 


; negate I divisor | 

; 1 | push Div 

; Divide By 0 ? 
high numerator = 0 

; || push mf 

; || push Num 

; input lo | numerator I 


lrse2 = 29 ; loop count - 1 

Tmp = divi(Div, Num=Num) ; 1-st divide iterate 

Tmp = divi(Div, Num=Tmp [n] Num ) ; 2-nd divide iterate 

LoopSW: Tmp = divi(Div, Num=Tmp [n] Num ) ; divide iterate 3-32 


ans = mf 

|| Div = *sp++ 
Num = argl A arg2 
I| br = iprs 
ans =[n] -ans 
I I mf » *sp++ 
Num = *sp++ 


I ans | = mf 
I| pop Div 
; quotient sign 
; | | return 

; quotient is negative, 
; | | pop mf 

pop Num 


Div_By_0: 
Div Ovf1: 


Divide By 0 \_ Optional Error 

Divide Overflow / Return Code 


br = iprs 

I 1 Div - *sp++ 
mf = *sp+ + 
ans = 0 - 1<<31 
1| Num = *sp++ 


return 

; | | pop Div 

; pop mf 

; returns 0x80000000, sets sr(V) 
; | | pop Num ... [END] 


.global $update_model 

/•k-k'k-k-k-k-kic-kie-k-k-k'k'k-kir'k'k-k'k-k'k-k^c'k'kie-k-k'k'kit'kie'k'k'k'k'k'k'k-k'k'k-kif'kit'irir'k'k'k-k'k-k'k-k'k'k'k-k'ie'kie-k 
* * 

* Function : update_model2 * 

1 Args : none * 

* 

r Description: * 

r update_model will be called to update the arithmetic * 

r model’s parameters. This routine was lifted from a C * 

r compiled program and included here, for cache coherency. * 

r * 

Return Values: * 

None * 




$update__model2: 

aO « &* (xba + $cum__freq) 

nop 

al - dl 

|| dl =sh *a0 

dl = dl - 75 

br =[ne] L6 

nop 

a2 =g [ne.ncvz] al 
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d2 -svi *{xba + $No_of_symbols) 
dl = d2 - 0 
br =[It] L6 

nop 

a2 =g [It.ncvz] al 

dl =g aO 
lei = L5 - 8 

d5 - d2 « 1 

d3 - &*(xba + $freq) 
aO = d5 + dl 


d4 = d2 « 1 

I | lrsl = d2 . 

a8 = d4 + d3 
d2 = 0 

L4 : 

dl =sh *a8 
dl = dl + 1 

d3 = dl »u 31 

dl = dl + d3 

dl = dl »s 1 

dl =shO dl 
*a8 -h dl 
*a0— =h d2 



dl 

=sh *a8— 


d2 

= d2 + dl 

L5: 




a2 

=g al 

L6: 




d3 

- &*(xba + $freq) 


dl 

- a2 « 1 


d5 

= a2 « 1 


d4 

= d3 + dl 


aO 

= d5 + d3 


a8 

* d4 - 2 


nop 



d4 

=sh *a8 

1 1 

d3 =sh *a0 


d3 

= d3 - d4 


br 

=[ne] L9 


d2 

=g [ne.ncvz] a2 


d3 

=[ne] d2 - al 

L7: 




dl 

= dl - 2 

1 1 

d3 =sh *—a8 


a2 

= a2 - 1 

1 1 

d4 =sh *—aO 


d3 

ii 

a 

»£* 

i 

a 

U) 


br 

=[eq] L7 


nop 

nop 
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L8 : 

d2 =g a2 
d3 - d2 - al 

L9: 

br = [ge] Lll 

nop 

nop 

xO = &*(xba + $index_to_char) 

nop 

a8 =ub *(a2 + xO) 

x8 = &*(xba + $char_to_index) 

a9 =ub *(al + xO) 

* (a2 + xO) =b a9 

* (al + xO) =b a8 
*(a8 + x8) =b al 
*(a9 + x8) =b a2 

Lll: 

d3 =sh *a0 
d3 = d3 + 1 

d3 = a2 - 0 
I | *a0 =h d3 

br =[le] L15 
br =[le] L16 

nop 

le2 = L14 - 8 
d2 = a2 - 1 

d3 = (xba + $cum_freq) 
lrs2 = d2 
aO = dl + d3 

nop 

L13: 

dl =sh *--a0 
dl = dl + 1 
*a0 =h dl 

L14: 

br - L16 

nop 

L15: 

nop 

L16: 

br = iprs 

nop 

nop 


.global $bit_plus_follow 
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/**********************★***★*************»************************** 
* * 

* Function : bit_plus_follow2 * 

* Args : none * 

* * 

* Description: * 

* bit_plus_follow will be called to output several bits that * 

* have been encoded by the arithmetic encoder. * 


★ * 

* Return Values: * 

* None * 

* * 


***************+*************★****+********★******★***★**********+*/ 


$bit_plus_f ollow2: 

dO = & *(sp —= 8) 

* (sp + 4) -w iprs 
call = $new_output_bit 

nop 


d6 = dl 

I| * (sp + 0) =w d6 

dl =sh *(xba + $bits_to__follow) 
dl - dl - 0 
br =[le] L36 

nop 

dl = [gt ] d6 - 0 

d6 - 1 M d6 =[ne] al5 

L34 : 

call = $new_output_bit 

nop 

dl - d6 

dl =sh *(xba + $bits_to_follow) 
dl - dl - 1 
dl =shO dl 

dl = dl - 0 

| [ *(xba + $bits_to_follow) =h dl 

br = [gt] L34 

nop 

nop 

L36: 

br = *(sp + 4) 

nop 


d6 =sw * (sp + 0) 
dO = &*(sp ++“ 8) 


.global $bit_index 
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/***★**************+***************★**★*********************★*****★* 
* * 

* Function : new_output_bit2 * 

* Args : dl - the bit to append * 

* * 

* Description: * 

* new_output__bit will be called to append a single bit to * 

* the bitstream array. * 


* * 

* Return Values: * 

* None * 

* * 


*************x*******************'************************'**Tlr**Tlr'**** j 


$new_output_bit2: 


- 


d2 =sw 

* (xba 

+ 

$STOP) 

d2 == d2 

- 1 



br =[eq] 

iprs 



d2 =uh 

* (xba 

+ 

$bit_index) 

d4 = d2 

»u 3 



1| d3 =uw 

* (xba 

+ 

$byte_stream 

aO = d4 

+ d3 



dO =uh 

* (xba 

+ 

$T_BYTES) 

d4 - (d2&7) 



i | d3 =ub 

*a0 



d3 = d3 

1 (dl 

<< 

d4) 

*a0 =b 

d3 



II d5 = d2 

+ 1 




* (xba + $bit__index) -h d5 
d5 = d5 - (dO « 3) 


br =g iprs 

dl = 1 || dl -[It] al5 

*(xba + $STOP) =w dl 


/calculate new STOP value 


end__subpass3: 
nop 
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/ ******************************************************************* 
* * 

** ztr.s (PP Program) 

* * 

** Written by Jim Witham, Code 472300D, 939-3599 
★ ★ 

** File that contains the following assembly language subroutines: 

* * 

** $comp_ztr 

-k k 

.global $comp_ztr 

/*******★****************************************************+****** 


* Function : $comp_ztr * 

* Args : s * 

* Passed in : dl * 

* * 

* Description: * 

* s - the subblock to process * 

* * 

* $comp_ztr will be called to calculate the ztl array for a * 

* single subblock. * 

* * 

* Return Values: * 

* None * 

* * 


★ a***************************************************************** / 


$comp_ztr: 

lctl = 0x0 /reset looping capability 


d2 - dl « 9 
xO = d2 + Oxlfe 
nop 

aO = &* (dba + xO) ; stats_val [s ] [m=255] = stats__val + (s*256 + m) M2 

bytes/word) 

;(Oxlfe, 0x3fe, 0x5fe, 0x7fe, 0x9fe, Oxbfe) 

d2 = dl « 7 
xO = d2 + 0xc7e 
nop 

a2 - &*(dba + xO) 
nop ;(0xc7e, 

a 8 = a2 


;ztl[s][p=63] = ztl + (s* 64 + p) *(2 bytes/word) 
Oxcfe, 0xd7e, Oxdfe, 0xe7e, Oxefe) 


/clear out ztl[s][0..63] array 
xO = d2 + OxOcOO 
nop 

al = &* (dba + xO) 

IrseO = 31 
dl = 0 
nop 

*(al++=[l])= dl 


IrO - 3 

/first innerloop 

Irl - 3 

/second innerloop 

lr2 = 11 

;outerloop 
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leO = firstloopend2 
IsO = firstloopstart2 
lei = secondloopend2 
Isl = secondloopstart2 
le2 ■ outerloopend2 
ls2 = outerloopstart2 

nop 

lctl = 0xba9 /associate leO with IcO, lei with lcl, and le2 with lc2 

nop 

nop 

d4 -sh *(aO——[1]) 

outerloopstart2: 
nop 

firstloopstart2: 

d4 = |d41 

d4 - 31 - lmo(d4) 

M d2 =h * (a2) 
d4 = l«d4 

| | d3 =sh * (aO—= [1]) 

d3 = Jd3| 

d3 = 31 - lmo(d3) 

d3 = l«d3 

d2 = d2 | d4 | d3 
I ! d4 =sh * (aO—=[1]) 

firstloopend2: 

*{a2—=[1]) =h d2 

a7 -uh *(a2++=[4]) 

secondloopstart2: 

d4 = |d4! 

d4 = 31 - lmo(d4) 

I| d2 =h *(a2) 
d4 = l«d4 

|| d3 =sh *(aO—=[1]) 

d3 = 1d3( 

d3 = 31 - lmo(d3) 

d3 = l«d3 

d2 = d2 | d4 ! d3 
I | d4 =sh * (aO-[1]) 

secondloopend2: 

*(a2—=[!]) -h d2 


outerloopend2: 
nop 

IrO = 1 /first innerloop 

lrl = 1 /second innerloop 

lr2 = 5 /outerloop 
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leO = firstloopend3 
IsO = firstloopstart3 
lei = secondloopend3 
lsl - secondloopstart3 
le2 - outerloopend3 
ls2 - outerloopstart3 

nop 

nop 

outerloopstart3: 
nop 

firstloopstart3: 

d4 = |d4| 

II d3 =h *(a8) 
d4 = 31 - lmo(d4) 

II d2 =h *(a2) 
d4 = l«d4 
d3 = d3 | d4 
I| d4 =sh *(aO—-[1]) 

d2 « d2 | d3 

I | * (a8—=[1]) =h d3 

d4 = | d4 | 

| | *(a2) =h d2 
d4 = 31 - lmo (d4) 

|| d3 =h * (a8) 
d4 - l«d4 

II d2 =h *(a2) 
d3 = d3 | d4 

I| d4 =sh *(aO—=[1]) 

d2 = d2 | d3 
I | *(a8—«[1]) =h d3 


firstloopend3: 

* (a2—— [1]) =h d2 

a7 =uh *(a2++=[2]) 

secondloopstart3: 

d4 = |d4| 

|| d3 »h *(a8) 
d4 = 31 - lmo(d4) 

II d2 =h *(a2) 
d4 - l«d4 
d3 = d3 | d4 
I| d4 =sh *(aO—= [1]) 

d2 = d2 | d3 

I | Ma8-[1]) =h d3 

d4 - 1d4| 

I| *(a2) -h d2 
d4 = 31 - lmo(d4) 

II d3 -h *(a8) 
d4 = l«d4 
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|| d2 =h *(a2) 
d3 = d3 | d4 
| | d4 =sh * (aO—=[1]) 

d2 = d2 | d3 
|| *(a8—=[1]) =h d3 

secondloopend3: 

*(a2—-[1]) «h d2 


outerloopend3: 
nop 


IrO = 2 ;first loop 

leO = firstloopend4 
IsO = firstloopstart4 

nop 

nop 


firstloopstart4: 

d4 = |d4| 

II d3 -h *(a8) 
d4 = 31 - lmo(d4) 

|| d2 =h *(a2) 
d4 = l«d4 
d3 = d3 | d4 
I I d4 =sh * (aO—»[1]) 

d2 = d2 | d3 

| | Ma8-—[1J) “h d3 

d4 = |d4I 
j| *(a2) =h d2 
d4 = 31 - lmo(d4) 

|| d3 =h *(a8) 
d4 = l«d4 
|| d2 -h *(a2) 
d3 = d3 | d4 
I| d4 =sh *(aO—=[1 ] ) 

d2 = d2 | d3 
| | *(a8—=[1]) =h d3 

*(a2) =h d2 

d4 = |d4I 
|| d3 =h *(a8) 
d4 = 31 - lmo(d4) 

M d2 =h * (a2) 
d4 = l«d4 
d3 = d3 I d4 
|| d4 -sh * (aO——II]) 

d2 “ d2 | d3 
|| *(a8—=[1]) =h d3 


d4 - |d4| 
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I I * (a2) -h d2 
d4 = 31 - lmo(d4) 

M d3 =h *(a8) 
d4 = l«d4 

II d2 =h *(a2) 
d3 = d3 | d4 

I| d4 =sh *(aO—=[1]) 

d2 = d2 | d3 
I | Ma8— =[1] ) =h d3 

*(a2—- [1]) =h d2 


firstloopend4: 
nop 

d4 - |d4| 

M d3 =h *(a8) 
d4 = 31 - lmo(d4) 

II d2 =h *(a2) 
d4 = l«d4 
d3 - d3 I d4 
I | d4 =sh * (aO—[1] ) 

d2 = d2 | d3 
|| *(a8-[1]) =h d3 


d4 = |d4| 

I| * (a2) =h d2 
d4 *= 31 - Imo (d4) 
j| d3 =h *(a8) 
d4 - l«d4 
II d2 =h *{a2) 
d3 = d3 | d4 
I| d4 =sh *(aO—-[1]) 

d2 » d2 | d3 
I| *(a8—=[1]) =h d3 


*(a2) =h d2 

d4 - |d4| 

I I d3 =h *(a8) 
d4 = 31 - lmo(d4) 

M d2 =h * (a2) 
d4 = l«d4 
d3 = d3 | d4 
I| d4 =sh *(aO—=[1]) 

d2 - d2 | d3 

I I *(a8—-[1]) -h d3 

d4 = |d4| 

|| *(a2) =h d2 
d4 - 31 - Imo(d4) 

II d3 -h *(a8) 
d4 = l«d4 

II d2 =h * (a2) 
d3 - d3 | d4 


B-267 


NAWCWD TP 8442 


d2 - d2 | d3 
II * (a8) -h d3 

* (a2) -h d2 


br = iprs 

nop 

nop 
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